Agent Memory: Context That Survives a Restart
Humans have working, long-term, and procedural memory — an agent needs the same. In 2026 memory stopped being an optimization and became an architectural requirement: without it every session starts from scratch, and the user explains the same thing for the third time in a row.
AdvancedAI Agents25 minMem0, JSON file storage, Claude API
1
Short, long, working — an agent has three memories
In humans, working memory holds the current conversation, long-term memory stores facts about the world, and procedural memory holds skills like riding a bike. An agent has the same structure: context window = working memory, external storage = long-term, system prompt with rules = procedural.
Inside long-term, three more types exist: episodic (what happened in a specific session), semantic (stable facts about the user or project), procedural (how to do a task correctly). Mixing them in one store is a classic mistake: the fact "user prefers TypeScript" lives forever, while "today I fixed an auth bug" is stale in a week. Separate them by type from day one.
Episodic
- What happened in a session
- Example: "yesterday we fixed a Redis bug"
- Goes stale fast
Semantic
- Stable facts
- Example: "project uses Next.js 16, pnpm"
- Changes rarely
Procedural
- How to do X correctly
- Example: "run npm run lint before commit"
- Lives in the system prompt
Mixing types is the first mistake. Store them separately, otherwise the agent will blend facts with experience, and procedural rules will drown in stale events.
2
File or vector DB? Most projects get this wrong
Counterintuitive but true: for most agents, file-based memory outperforms vector retrieval. The reason is simple — if the entire memory fits in 5–10K tokens, loading it whole is cheaper and more reliable than searching for relevant chunks. A vector DB adds infrastructure, retrieval miss risk, and one more service that can fail.
When is a file justified? A personal assistant, a coding agent for one project, a support bot with a fixed FAQ base. When do you actually need a vector DB? When you have thousands of documents, hundreds of users with different context, or a sub-second latency requirement. Choose by real scale, not "future-proofing".
File-based or vector DB?
Memory < 10K tokens — a file is enough
Single user / single project
Need to search across thousands of documents
Hard requirement of latency < 1s
"Future-proofing" without real data — no
Mem0 2026 benchmark: file-based memory gives 72.9% accuracy, vector gives 66.9%. But vector is 0.2s vs 17s. Choose by latency, not size.
3
An index, not a dump: how the agent decides what to read
The naive approach: one giant memory.md loaded every session. A month in, it's 30K tokens, half of it stale, the agent burns context on noise. The right pattern is an index: MEMORY.md lists files with one line per file. Details live in separate files, loaded only when needed.
This works like a book's table of contents. You don't read the whole book to find the authentication chapter — you check the TOC and open the right page. The agent does the same: reads the index, picks 1–2 relevant files based on the user's task, loads only those. Context stays lean, and memory scales without quality degradation.
Session start
Read MEMORY.md (index)
Pick relevant files
Load only chosen
Work on the task
session_start:
прочитать MEMORY.md (индекс: список файлов + 1 строка на каждый)
по задаче пользователя → выбрать релевантные файлы
загрузить только выбранные → не всё подряд
во время сессии:
узнал новый факт → записал в file.md
добавил ссылку в MEMORY.md (одна строка)4
What to save, and what to squeeze out
Memory is not an event log, it's a distillation of what can't be recovered from the current state of the world. Four types worth saving: user profile (name, role, preferences), feedback and corrections (what the user fixed — this teaches the agent), project state (stack, conventions, open questions), references to external resources.
What NOT to save? Code — it's already in the repo. Git history — `git log` remembers. Intermediate task state — once solved, it's trash. Debugging noise — attempts, errors, reverts. Simple rule: if a fact is recoverable via `grep` or `git log`, it's not memory, it's a duplicate. Save what lives only in the user's head.
| Save | Do not save |
|---|---|
| Profile: name, role, stack | Code — it's already in the repo |
| User corrections and their "not like that" | Git history — it's in `git log` |
| Project conventions, open questions | Intermediate task state |
| Links to external resources and docs | Debug noise: attempts, reverts |
If a fact is recoverable via `git log` or `grep` — don't save it. Memory is for what's NOT in the code.
5
Memory goes stale — and that's more dangerous than having none
An agent without memory re-asks for context every time — slow but safe. An agent with stale memory acts confidently on wrong data — and that's far worse. Typical scenario: memory says "AuthService.ts contains login logic" → the agent edits this file → but three months ago it was renamed to auth/service.ts, logic moved, the old file is an empty shell. The agent confidently breaks what's no longer there.
Three defenses against staleness. First, verify before acting: before relying on a fact from memory, check it in the current state (grep, ls, API call). Memory is a hint, not a source of truth. Second, periodic review: once a month walk through MEMORY.md and delete what's no longer relevant. This can't be automated — it's manual hygiene. Third, immediate deletion on detected error: if the agent finds something wrong in memory, remove it right away, not "fix later". Every stale fact is a landmine under the next session.
Fundamental difference: the current state of code always wins over memory. Memory is a snapshot of the past, code is the truth of the present. Conflict? Always pick the present.
Good habit: before using a fact from memory, verify it in the code. "Memory says X exists" ≠ "X exists now".
Result
You understand that agent memory is an architecture of three types (episodic, semantic, procedural), why a file is more often right than a vector DB, and how an index with separate files scales without context bloat. The key point: memory goes stale, and verifying before acting matters more than the act of saving itself.