Three-Tier Memory Architecture for AI Agents Explained

Your AI Agent Doesn't Have a Memory Problem — It Has a Memory Architecture Problem

Most developers building AI agents reach the same early milestone: they plug in a vector database, store some documents, and call it a knowledge base. It works well enough in demos. Then reality hits. Users start asking questions the vector search can't quite answer — not because the data isn't there, but because the retrieval strategy was never designed to handle the full range of how humans actually remember things.

This is the core insight behind the Knowledge-and-Memory-Management (KMM) system developed for the Hermes Agent ecosystem. After running a vector-only knowledge base and discovering its limits firsthand, the architecture evolved into something far more robust: a three-tier memory pipeline that mirrors the way human memory actually works across different levels of specificity and certainty.

The Three Types of Memory Queries Your Agent Will Face

Before diving into the architecture, it's worth understanding why a single retrieval strategy always falls short. Think about how you'd ask a colleague to find something for you. Sometimes you know exactly what you want. Sometimes you have a vague idea. And sometimes you don't even know what you're missing.

These three scenarios map directly to three fundamentally different retrieval challenges:

Precise recall — You know exactly what you're looking for. "Find me that article about orphan pages in gbrain." A keyword or exact-match search handles this perfectly. Semantic search is overkill and may introduce noise.
Fuzzy recall — You remember the general topic but not the specific terms. "There was something about knowledge graphs…" This is where vector-based semantic search earns its keep, bridging the gap between your vague phrasing and the actual content.
Exploratory recall — You don't know what you don't know. "Which of my notes are disconnected from everything else?" This requires something neither keyword search nor vector similarity can provide: relationship traversal across a knowledge graph.

If your agent only has one of these tools, it will inevitably fail two-thirds of the time. The solution isn't to pick the best single method — it's to stack all three and let them work together as a cascading pipeline.

Inside the Three-Tier Architecture

The KMM system is built as an extension layer on top of the Hermes Agent ecosystem, using hermes-memory-installer as its foundation. That base layer provides two key components: gbrain, a knowledge graph engine, and Hindsight, a vector memory system. KMM adds four module groups on top — collection, notes and RAG, cloud sync, and knowledge augmentation — but the engineering centerpiece is the three-tier cross-layer recall system.

Tier 1: FTS5 Full-Text Search — Precision First

The first tier queries a local SQLite database using FTS5, SQLite's built-in full-text search extension. FTS5 is fast, lightweight, and extremely accurate when the user knows what they're looking for. If someone asks for a document by a specific term, name, or phrase, FTS5 will surface it with near-instant results and zero ambiguity.

In the pipeline, FTS5 runs first. If it gets a confident hit, the query stops there. There's no need to burn compute cycles on semantic embedding models when a simple keyword match already found exactly what was needed.

Tier 2: Hindsight Vector Semantics — When Keywords Aren't Enough

If FTS5 doesn't return a satisfying result, the pipeline escalates to Hindsight, the vector memory layer. Hindsight encodes documents and queries as text embeddings and finds matches based on semantic similarity rather than exact keyword overlap. This is the layer that handles fuzzy, natural-language queries where the user knows roughly what they want but can't articulate it precisely.

Vector semantic search excels at understanding intent over literal wording, making it the ideal middle tier — more powerful than keyword search, but more computationally focused than graph traversal.

Tier 3: gbrain Knowledge Graph — Discovering What You Didn't Know to Ask For

The third tier is where the architecture becomes genuinely novel. If both FTS5 and Hindsight come up short, the system expands into the gbrain knowledge graph, traversing adjacent nodes to surface related content the user didn't explicitly ask for. This is relationship-driven discovery — finding the notes, concepts, or documents that are meaningfully connected to the query even when there's no direct textual match.

This tier is what separates a smart agent from a simple search tool. Knowledge graphs can answer questions like "what else is related to this concept?" in a way no text-matching algorithm can replicate.

The Pipeline in Practice: lightweight_recall.py

The three tiers are wired together in a single Python script called lightweight_recall.py, located at $AGENT_HOME/scripts/lightweight_recall.py. The implementation uses a KnowledgeManager class with a tiered_search() method that automatically cascades through the three levels based on result confidence. A single call triggers the full pipeline — no manual orchestration required.

The elegance of this design is in its fallback logic. Each tier only activates if the previous one fails to meet a confidence threshold. In the common case, FTS5 handles the query immediately and the heavier semantic and graph layers never run, keeping response times low and compute costs minimal.

Why This Matters for Anyone Building AI Agents

The broader lesson here extends well beyond any specific implementation. As AI agents become more capable and are trusted with larger, more complex knowledge bases, retrieval quality becomes the primary bottleneck. An agent that can only do one kind of recall will always have blind spots — and users will notice.

A three-tier memory architecture acknowledges that memory isn't one thing. It's precise when it needs to be, fuzzy when it has to be, and exploratory when nothing else works. Building agents that can operate across all three modes isn't a nice-to-have — it's the baseline for any system that needs to handle real-world, unpredictable queries reliably.

If you're currently running a vector-only knowledge layer in your agent, the question isn't whether it has gaps. It's how often those gaps are costing you accuracy — and whether you're even measuring that.

Getting Started with Tiered Memory in Your Own Agent

Whether you adopt the KMM architecture directly or draw inspiration from its design principles, the first step is the same: audit the types of queries your agent actually receives. Map them against the three categories — precise, fuzzy, and exploratory — and see where your current retrieval strategy fails. Chances are, one tier alone is handling all three, and doing a poor job of at least one.

From there, adding FTS5 over a local SQLite store is a low-effort first upgrade. Layering in a vector embedding model for semantic fallback is the natural next step. And integrating a knowledge graph as the final discovery layer transforms your agent from a search tool into something that genuinely understands the structure of its own knowledge.

The chat log was never the right metaphor for agent memory. A tiered architecture built for retrieval is.