The Memory Problem — Modicum Studio

Every AI conversation starts the same way: from nothing.

You explain who you are, what you're working on, what you've already tried. You re-establish context that existed five minutes ago in a different window. The model is brilliant within a session — and amnesiac across them.

This is the memory problem. And it's why I built AIMe.

What Already Exists

To be fair — this isn't a problem nobody's noticed. ChatGPT has a memory feature that stores facts you tell it. Claude has project knowledge. Google is building Gemini's memory infrastructure with the full weight of their ecosystem behind it — your email, calendar, search history, documents, all feeding context to the model.

These are real capabilities, not vapourware. But they share two fundamental limitations.

First, they're shallow. Stored memories are flat key-value facts — “lives in Australia,” “prefers dark mode.” There's no relational reasoning, no temporal awareness — when did this become true? What did it replace? — and no automatic enrichment from conversation. You have to tell the AI what to remember. It doesn't learn.

Second — and this is the bigger issue — the memory is the lock-in. The more an AI platform knows about you, the harder it is to leave. Google understands this better than anyone. They're actively building import tools to pull your context from ChatGPT and Claude into Gemini. The pitch is convenience. The mechanism is capture.

Google is probably best positioned to build a genuinely deep harness. They already have more of your data than anyone. But that's exactly the trade: the deeper their harness goes, the more completely you're inside it. Your accumulated context becomes the switching cost. The platform that knows you best is also the one you can least afford to leave.

What a Harness Actually Is

A harness isn't an IDE with AI autocomplete bolted on. It's the full intelligence infrastructure between you and the model — how context is managed, what the model remembers, what tools it has access to, how knowledge persists. Memory management, knowledge retrieval, context assembly — these are core harness concerns, not add-ons.

Models are commoditised. The leading foundation models are roughly comparable for most tasks. The harness — the orchestration layer — is where value is created.

Most of what passes for harness engineering today is shallow: session-level tooling, file access, code execution, error recovery. Impressive within a window. But when the session ends, everything evaporates. The next conversation starts from zero.

The End of Sessions

The typical framing is “cross-session memory” — making AI aware of what happened in previous conversations. But that still treats sessions as the fundamental unit. You're still session-hopping, just with a longer recall.

That's not what I wanted. I wanted to eliminate sessions entirely.

When you talk to someone you know well, there are no sessions. There's one continuous relationship with persistent context. You don't re-introduce yourself. You don't segment topics by conversation. Everything flows because the other person knows your history, your preferences, your ongoing projects, and the decisions you've already made.

That's what AIMe does. Not session management — session elimination. One continuous thread with a persistent character. Topics are routed automatically, context surfaces when it's relevant, and nothing needs to be re-explained. The same way it would work with a person.

How It Works

The persistent infrastructure underneath is what I call the Personal Context Layer — three systems working together:

Knowledge Graph — A temporal knowledge graph that stores entities, relationships, decisions, and learnings. When I decide something, it's captured. When something changes, the graph updates with temporal metadata — so the AI knows not just what's true, but when it became true and what it replaced.

Vector Database — Semantic search over raw content. Voice memos, documents, transcripts, notes — anything I've captured can be discovered by meaning, not just keywords. When a topic comes up, relevant raw material surfaces automatically.

Entity Pages — Structured reference documents for every significant entity in my life. People, projects, places, decisions. Human-readable, version-controlled, and queryable. The operational layer — what the AI pulls up when it needs to know the current state of something.

These three layers create persistent intelligence. Every interaction enriches them. Every future interaction benefits from them.

How the knowledge graph handles time

Most AI memory is append-only — new facts pile up without invalidating old ones. The knowledge graph stores temporal metadata with every relationship: when it was created, when it was superseded, what replaced it. If I move from Perth to Sydney, the graph doesn't just add “lives in Sydney” — it marks “lives in Perth” as superseded, preserving the full history. The AI can reason about what's current, what changed, and when. Corrections propagate the same way — if an earlier fact turns out to be wrong, the graph records the correction with the original still visible as context.

The Pipeline

The system runs a three-stage pipeline for every interaction:

Context Builder — A fast, lightweight agent that pre-fetches relevant context before the main AI speaks. It identifies entities, generates targeted queries against the knowledge graph and vector database, and assembles the results. Critically, it works by reference — pulling authoritative sources and passing them through, not regenerating or summarising content. The main AI sees exactly what it needs, assembled from the source of truth.

Job Handler — The AI that actually responds. It receives the pre-assembled context and has full access to tools, files, and the knowledge infrastructure. This is where the work happens.

Memory Keeper — After the response, a separate agent extracts intelligence: new facts, entity updates, decisions, corrections. These get persisted back into the knowledge layer for next time.

One agent sets it up. The next one knocks it down. The third one remembers everything. Every interaction both uses and enriches the knowledge layer. It's a flywheel — the more it's used, the more valuable it becomes.

How context assembly works

The Context Builder doesn't summarise or regenerate content. It works by reference — identifying the entities and topics in a message, generating targeted queries against the knowledge graph and vector database, then passing the authoritative results through to the main agent. The main AI sees the actual source material, not a lossy summary of it.

Queries are generated programmatically from the Context Builder's entity extraction, not from the user's raw text. This means the retrieval is precise — pulling the specific graph facts, entity pages, and raw documents that are relevant — rather than a broad semantic search that returns vaguely related content. The model does the thinking. The code does the fetching.

Harness Depth

The formula in AI agent development is simple:

Agent = Model + Harness

The model is commoditised. The harness is where differentiation lives. But harnesses vary enormously in depth. Most stop at session-level tooling — give the model access to files and a terminal, and call it done. A deeper harness handles persistent knowledge, automatic context assembly, and continuous memory. Same formula. Different depth.

That depth is the part that's hard to replicate. Not because the patterns are secret — but because the accumulated knowledge is personal. Hundreds of sessions of decisions, preferences, learnings, and context that can't be recreated by installing a different tool.

Sovereignty

I self-host everything on modest hardware — an Intel NUC in rural Australia. The knowledge graph, vector database, entity pages, and pipeline all run locally. Data stays on my machine. No cloud dependency for core operations.

This isn't ideological purity — it's practical risk management. In early 2026, Anthropic's enforcement actions against third-party harnesses showed what happens when your AI infrastructure depends on someone else's platform decisions. OAuth tokens were revoked. Projects with hundreds of thousands of users were shut down. Terms of service changed overnight.

I use Claude as the underlying model via API, but the knowledge layer — the part that makes it mine — is sovereign. If I switch models tomorrow, the accumulated intelligence comes with me. Open formats only: markdown, JSON, CSV. No lock-in.

Where This Goes

Models will continue to commoditise. Session-level harness capabilities will too — there are already dozens of excellent options, and they're converging on similar feature sets. The major platforms will build real memory. Google will probably do it best — they have the deepest well of personal data to draw from.

But the platforms that build the deepest memory will also be the ones you can least afford to leave. Your accumulated context — years of decisions, preferences, relationships, learnings — becomes their retention mechanism. The intelligence layer that makes the AI useful is the same layer that makes switching impossible.

The alternative is to own that layer yourself. Open formats, local storage, model-agnostic. The knowledge persists regardless of which model you're using or which platform you're paying for. If you switch providers, the accumulated intelligence comes with you.

That's what I'm building. Not a better chatbot wrapper. A knowledge-aware collaborator that compounds — and that belongs to me.