
Agent Memory Deep Dive — How OpenClaw Memory Works, Improvements, and Different Approaches
OpenClaw's Markdown-based memory is transparent and debuggable, but has fundamental limitations. Here's what works, what doesn't, and what the alternatives look like.
The more you use OpenClaw, the worse its memory gets. It remembers everything you tell it but understands none of it. The problem isn't storage. It's structure.
That's the brutal diagnosis from the community, and after digging into how OpenClaw's memory actually works, I think it's spot-on. Let me break down why—and what you can do about it.
How OpenClaw Memory Actually Works
OpenClaw takes a refreshingly simple approach: memory is just Markdown files in your agent workspace. The files are the source of truth. The model only "remembers" what gets written to disk. No hidden vector databases, no opaque embeddings pipeline you can't debug.
It uses two memory layers:
- Daily logs —
memory/YYYY-MM-DD.mdcaptures what happens in each session - Curated long-term memory —
MEMORY.mdfor the important stuff you want to keep around
When you need to recall something, you've got two tools:
memory_search— semantic recall over indexed snippetsmemory_get— targeted read of a specific file or line range
Under the hood, search works by chunking your Markdown into ~400 token pieces with 80-token overlap, then embedding them. OpenClaw supports hybrid search combining vector similarity with BM25 keyword matching, which helps when you need exact matches rather than conceptually similar ones. Embeddings are cached, and there's temporal decay (30-day half-life by default) so recent stuff bubbles up. You can choose embedding providers too—OpenAI, Gemini, Voyage, or run local embeddings via node-llama-cpp if you want to keep everything on your machine.
It also has a clever trick: automatic memory flush before context compaction. When your session gets close to the context window limit, OpenClaw triggers a silent agentic turn that writes durable memories to disk before compaction wipes them from context. It's a smart safety net, but it only works if you've already told the system what matters.
On paper, this is elegant. In practice? There's a fundamental flaw that no amount of clever engineering around embeddings can fully solve.
The Problem: It Remembers Everything, Understands None of It
Vector search gives you similar text. That's not the same as memory.
Here's the issue: OpenClaw's memory can't reason about relationships between facts. It knows Alice exists. It knows authentication exists. But it can't connect them. The retrieval is surface-level—it finds text that sounds related, not information that is related.
Let me make this concrete. Say you've been debugging an auth issue for three days. Day one, you wrote a note about Alice's login failing. Day two, you documented that the token validation logic is in auth.ts. Day three, you figured out the expiry check is wrong. In a proper memory system, the agent would understand: "Alice's issue is caused by the bug in auth.ts token validation." In OpenClaw? It can retrieve all three notes. It just can't connect them.
This is the exact limitation that makes people say "the more you use it, the worse it gets." You're piling more and more facts into a flat file, but there's no structure connecting them. It's like having a notebook where every page has useful notes, but there's no index, no cross-references, no understanding that Page 3 explains the error on Page 7.
The result? Retrieval gets noisier over time. The semantic search still works—it finds similar text. But "similar" doesn't mean "relevant." You get results that look right but miss the actual point. This is why people end up using memory_get to manually pull specific files instead of trusting memory_search. They're working around the system's fundamental limitation.
The Three Approaches to Agent Memory
The AI agent space has converged on three main architectures for memory. Each has trade-offs worth understanding.
1. Built-in Context
Just use the model's context window. Stuff everything in there, let the attention mechanism sort it out. The model sees everything, so it can theoretically connect anything.
Simple, but expensive at scale. Longer contexts = more tokens = higher costs and slower inference. There's also a hard ceiling—you can't exceed the context window, and even when you're close, the model's ability to pay attention to distant information degrades. This works for short projects, but breaks down when you need persistence across weeks or months of work.
2. Mem0
A dedicated memory layer with hierarchical storage, user profiles, and cross-session persistence. Mem0 adds an external service that manages embedding, retrieval, and recall across sessions.
It's more sophisticated—you get user profiles, preferences, conversation history that actually works across sessions. But it adds external dependencies. You're trusting a third party with your agent's memory. It also introduces latency and costs that pure local solutions don't have. Good for products that need to scale across many users; less ideal for personal automation.
3. Three-Tier Memory
The most promising approach—structuring memory into layers that mirror how humans actually work:
- Working Memory — what's in the current context window. Immediate, fleeting, high-bandwidth.
- Episodic Memory — past sessions, events, conversations. What happened, when, with what outcome.
- Reference Memory — organized, long-term knowledge. Facts, patterns, reusable information you can pull up when needed.
You don't remember every detail of every day equally—you have immediate awareness, recent memories you can still recall clearly, and organized knowledge you can access when asked. Your agent should work the same way.
OpenClaw's current system is closest to a simplified three-tier (daily logs act as episodic, MEMORY.md is reference), but it misses the critical piece: structured reference memory. It's still just indexed text, not organized knowledge.
What's Coming: Improvements and Alternatives
OpenClaw is aware of these limits. The QMD backend (experimental) combines BM25 + vectors + reranking for better retrieval. It runs locally via Bun and offers a more sophisticated search layer. Think of it as better infrastructure for the same fundamental approach—more accurate search, but still flat text.
But the real solution emerging is knowledge graphs. Instead of just storing text and finding similar text, you model entities and their relationships. Alice is a User. auth.ts is a Module. The bug is a Defect that affects the TokenValidation function which is called by the Login flow that involves Alice.
Now the agent can traverse relationships, reason about connections, and actually understand what it's remembering. "Show me everything related to Alice's login issue" becomes a graph traversal, not a keyword search.
There's a Cognee plugin in the works that aims to bring this to OpenClaw. If it works, it could solve the fundamental "remembers but doesn't understand" problem. It won't be simple—knowledge graphs require upfront schema design and consistent entity extraction. But the payoff is real: memory that actually reasons.
The Takeaway
Here's what matters: the agents that win in production will not have the biggest context windows. They will have the smartest memory.
OpenClaw's approach is transparent and debuggable—you can read your memory files, understand what's being stored, and fix it when it breaks. That's valuable. But if you're building something that needs to reason across sessions, understand relationships, and scale meaningfully, raw Markdown + vector search isn't enough.
The fix isn't more context. It's structure. And the OpenClaw team seems to be heading in that direction.
Want to dig into your own agent's memory? Start by reading your memory/ folder. What you find might surprise you.
If you're building something new, here's my advice: start with OpenClaw's simple file-based approach—it's genuinely useful for transparency and debugging. But from day one, think about structure. What are the entities in your agent's world? How do they relate? The moment you need your agent to connect dots across sessions, flat text won't cut it. You'll either need to layer on a knowledge graph, or you'll end up building one yourself the hard way.
The agents that win in production won't have the biggest context windows. They'll have the smartest memory.