Agent Memory
A vendor-agnostic, embedding-powered semantic memory system that enables agents to persist knowledge, recall context intelligently, and maintain continuity across workflow executions.
Why Memory Matters
Without memory, every workflow execution starts from scratch. With semantic memory enabled, agents can retrieve relevant past interactions and facts using vector similarity — not keyword matching.
How It Works
1. Embedding Generation
When memory is stored, the content is converted into a vector embedding using the configured embedding provider. This is fully vendor-agnostic and supports OpenAI, Gemini, HuggingFace, and Ollama (local).
2. Cosine Similarity Search
Before an LLM step runs, the system generates an embedding for the current prompt and compares it against stored memory using cosine similarity to retrieve the most relevant entries.
3. Threshold Filtering
Only memories above a similarity threshold are injected. This prevents irrelevant memory pollution and keeps responses grounded.
4. Prompt Injection
Retrieved memories are injected into the prompt with strong system instructions to ensure models — even smaller ones — properly utilize stored context.
Using Memory in Workflows
Memory is opt-in per LLM step and configured through the Workflow Builder under Advanced Options.
{
"type": "llm",
"prompt": "What is my project name?",
"useMemory": true,
"memoryTopK": 5
}Memory Safety & Retention
Retention Policy
Memory is capped per agent (default: 500 entries). Oldest entries are automatically pruned to prevent unbounded growth.
Token Guard
Injected memory is character-limited to avoid context overflow and excessive token usage.
Structured Storage
Memory is stored in structured format (user + assistant) for cleaner retrieval and future RAG expansion.
Vendor-Agnostic by Design
The memory system does not depend on any external vector database. Embeddings are generated via configured providers and stored locally in MongoDB. If an LLM provider does not support embeddings (e.g., Groq), the system automatically falls back to a local embedding provider such as Ollama.