Context Architecture: How We Built Memory and Continuity Into AI Agents

Every AI conversation starts from zero. That is the default. The model has no recollection of the previous session, no knowledge of the decisions made last week, no awareness that the same problem occurred twice before and was fixed both times in ways that did not stick.

For a casual use case — drafting an email, summarising a document — this is not a significant limitation. For a production system running 24 specialist agents on continuous operational work, it is a fundamental architectural problem.

The WAT system runs on the assumption that accumulated experience is an asset. A content writer that has produced 50 posts knows things about the voice and the standards that a content writer starting from zero does not. A QA agent that has seen 30 batches of content knows which failure patterns recur and which agents they recur with. A finance controller that has been tracking spend for six months has context about spending patterns that a finance controller who has just been instantiated does not.

Article illustration — context-architecture-memory-continuity-ai-agents

None of that accumulated experience exists by default. We had to build it. Here is how.

Persistent memory files

The most important architectural decision in the WAT system is also the simplest: every agent has a persistent memory file.

The memory file lives at a defined path in the file system: agent-memory/{Agent Name}/MEMORY.md. The file is not generated fresh each session — it accumulates entries over time, written by the agent after significant runs, updated after feedback is received, and read at the start of every session before the agent begins work.

The structure of a memory file includes: feedback from previous runs (positive and negative), known issues with the agent’s task types, decisions that have been made and should not be revisited, and task-specific notes from the last time this category of work was done.

The instruction to read the memory file is in the agent’s specification as a mandatory pre-invocation step. It is not optional and it is not skippable. The agent reads the file, understands the accumulated context, and then begins work.

The effect on output quality is significant. An agent with six months of memory is substantively different from an agent starting from scratch — not because the model has changed, but because the context it operates in captures the learning from every previous run.

Session hooks and BUILD_PROGRESS

Memory files address agent-specific continuity. There is a second context problem: system-level state.

The WAT system is building something — a collection of travel sites, a personal brand site, a consulting operation. The state of what has been built, what decisions have been made, what the current priorities are, and what was completed in the last session is not the same as any individual agent’s memory. It is the system’s memory.

For this, we use a build progress log: a running record of what has been built, what decisions have been made (with rationale), the current task queue, and the most recent session summary. At the start of every session, the project-level state is injected into context so that the agent or orchestrator is not operating from a blank slate about the project’s current condition.

This is the equivalent of a project status document in a conventional team — the thing a team member reads at the start of a week to understand what the team accomplished last week and what the priorities are this week. The difference is that it is injected automatically, not left to the human to remember to provide.

The failure mode it prevents: an agent confidently proceeding with work that was already completed, or proposing architecture changes that conflict with decisions already approved, or asking for information that was provided three sessions ago and is in the build log. These failures are not model failures. They are memory architecture failures. The build log is the fix.

Agent-specific memory with structured frontmatter

Each agent’s memory file is not free-form text. It follows a defined schema.

The schema includes: date of last update, agent role, active project context, recent feedback (scored with accuracy/completeness/usability dimensions), known failure patterns, resolved decisions, and pending tasks. The structured format means the memory can be read efficiently and the relevant sections can be located without parsing the entire file.

This matters for a specific reason: agents with large memory files would, if the memory were unstructured prose, spend a disproportionate amount of context budget parsing old information to find the current-state fields. Structured frontmatter and section headers solve the navigation problem. The agent can read the most recent entries, note the known issues, and proceed.

The schema also makes memory auditable. When a failure pattern recurs despite being documented in the memory file, the audit question is specific: was the memory file read? Was the relevant entry present? Was the specification updated to address the failure? If all three are yes, the failure pattern is more deeply embedded than the memory file can address alone — it requires a specification redesign, not just a memory update.

The difference between memory and instruction

One distinction that required careful architectural thinking: memory and instruction are not the same thing, and should not be stored in the same place.

Instructions are what the agent must always do. They live in the agent specification and in the operating manual. They do not change based on accumulated experience. “Write in British English” is instruction. “Never hardcode credentials” is instruction. These are not the kind of thing that should be in a memory file that accumulates contextual entries.

Memory is what the agent has learned or observed in specific previous runs. “In the last batch of content, the lint check identified three instances of em dashes that were missed during self-review” is memory — it is specific, it is contextual, and it informs how the agent should approach the next batch. “Client prefers shorter paragraphs in executive summary sections based on feedback from run 2026-03-15” is memory.

Mixing these produces memory files that are hard to read and harder to maintain. When instructions drift into memory files, the specifications become incomplete — the agent’s full operating rules are spread across two documents rather than one. When memory drifts into specifications, the specifications become cluttered with contextual information that should be updated after every run but is not, because specifications are treated as stable documents.

The separation is a structural discipline. Memory files are high-change. Specifications are low-change except when there is a deliberate update. Both have their place. Neither should contain the other.

What continuity produces

The operational value of context architecture is not just efficiency. It is quality.

An agent that reads its memory file before starting work is operating in a richer context than one that starts from scratch. It knows which failure patterns to avoid for this task type. It knows which approaches have worked and which have produced sub-standard outputs. It knows what feedback was received on the last comparable run and has already adjusted its approach.

This is the closest AI systems currently get to experience. The model does not accumulate experience between sessions. The memory architecture ensures that the relevant conclusions from experience are present in context when the session starts.

The limitation is real: the memory file is only as good as its maintenance. If agents do not update their memory files after significant runs, or if the memory files are not read before starting work (both of which can happen when the pre-invocation checklist is skipped), the architecture does not deliver its value. Memory files that go unmaintained are worse than no memory architecture — they create a false impression of continuity that does not actually exist.

The discipline required is not heroic. It is the same discipline required by any good project log: update it when something worth updating occurs, read it before starting new work. The architecture does not enforce the discipline. The specification and the operating culture do.

For boards

Context architecture is not a topic that belongs in most board discussions about AI. But there is a governance implication worth naming.

If your organisation is using AI agents for work where accumulated context matters — where the agent’s ability to act consistently with previous decisions, avoid previously identified failure modes, and build on previous work is relevant to the quality of output — the relevant governance question is: how does this system maintain continuity across sessions?

If the answer is “it does not — every session starts from scratch,” that is not necessarily a disqualifying answer. It is an accurate characterisation of the system’s limitation. Governance requires accurate characterisation. The risk is that AI systems are deployed with an implicit assumption of continuity that does not exist by design. The decisions from last month are not known to the system. The commitments made in the previous session are not remembered. What looks like a reliable, consistent agent may be a stateless process being presented as a persistent one.

Ask the question. The answer determines whether the deployment is being governed accurately.

For boards seeking a governance framework that includes questions about AI system continuity and memory architecture, the Board AI Governance Framework covers deployment approval criteria that address these characteristics.

For organisations building or reviewing AI agent architectures, contact Steven directly.

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.