What your OpenClaw agents forget, and where it should go instead
Your agents do real work, and then the session ends and most of it is gone. Not because the agent was bad. Because the only place it had to put the work was the session context.
We have been running autonomous agents on OpenClaw, and one pattern keeps repeating. The agent does genuinely good work over a long session, and then the session ends and most of what it worked out is gone.
It is not gone because the agent was bad. It is gone because the only place it had to put its thinking was the session context, which is a poor place to leave anything you want to keep. It runs as one long linear thread, only one reader ever sees it, and it disappears the moment the container is torn down or the context is compacted.
This post is about the other place that work can go, and why you want it there.
What Octopad is, briefly
Octopad is a back-office your agents read and update over
MCP, the same way they already read and write files. Tasks,
each with a Why, a What, and a Done-when. Typed knowledge:
decisions with their rationale, facts, open questions, risks.
Long-form pages. Reads come back assembled, not raw: ask
about a task and you get the briefing for it, not the whole
workspace. It is shared across every agent and human on the
workspace, and it persists. You add it to OpenClaw the way
you add any MCP server, with a few lines in
openclaw.json.
What it is not, and this matters for the rest of the post: it is not a telemetry layer. It does not watch your agent's token count or trace its tool calls. That is what the runtime observability tools are for. Octopad records a different thing.
Two questions a transcript cannot answer
When a run finishes, two questions come up almost immediately:
- What did the agent actually do, and why?
- How do I keep the next run from drowning in everything this one produced?
A transcript is a bad answer to both. Take them in turn.
"What did it actually do" is an audit question
Say an agent spent twenty minutes qualifying a list, triaging a backlog, or running a research sweep. Along the way it made choices, rejecting some things and pursuing others, and each of those choices had a reason at the time.
If all of that lives in the transcript, then auditing the run means reading the transcript. Hundreds of lines of interleaved thinking, tool calls, and results, in the order they happened, with no structure over them. To find out why the agent dropped the fourth candidate, you scroll. To check whether it finished everything it was supposed to, you scroll, and you hope you did not miss anything.
Now suppose the agent had written its work into Octopad as it went:
- Each unit of work is a task with a completion comment saying what shipped.
- Each non-trivial choice is a decision with its rationale attached.
- What got dropped, and why, is a risk or a note, not a sentence buried at line 900.
Auditing the run is no longer reading a transcript. It is reading a record. You open the workspace and you see what happened, why, and what is still open, without reconstructing it from scrollback. A teammate who was not in the room sees the same thing. So does the next agent that picks up the work.
Telemetry tells you the agent ran. The work record tells you what it decided, did, and left undone.
It is easy to file this under observability. "Monitoring what my agents do" sounds like exactly that, and the observability tools are good at what they do. But they answer how the machine ran: tokens, latency, traces, errors. They do not answer what the work was. The decision an agent made, and the reason behind it, never show up in a span. They live in the work itself, and the work needs somewhere structured to go.
"Keeping context from ballooning" is a memory question
The second problem is more mechanical. An agent doing real work accumulates context. Every tool result, every intermediate finding, every prior step sits in the window and never leaves. Long runs get slower, more expensive, and eventually less coherent, as the things that matter compete with the noise for the model's attention.
You do not fix this by asking the model to remember less. You fix it by giving the work somewhere to go that is not the context window.
When an agent writes its state out to Octopad, it can stop carrying that state in its head. The plan becomes a task tree it re-reads on demand, not a paragraph it has to keep in the window. The decision it made forty steps ago becomes a knowledge item it can look up, not a block of text it has to keep in view.
The reload is where build_context earns its
place, and it is the part that matters most. When the agent
picks up a task, it does not pull the whole workspace or
replay the entire history. It asks for the context around that
one task, and Octopad assembles the briefing on the server:
the dependencies, the linked pages, the decisions that bear on
it, and nothing else. The agent gets a tight, relevant slice
instead of an undifferentiated pile. The selection is the
point. Conveying the same working context in a fraction of the
tokens is the whole reason to do it this way.
It is worth being precise, because it is easy to overstate. Octopad does not magically shrink a context window. What it does is make offload and selective reload possible. Without a durable external store, an agent has no choice but to carry everything, because the moment it forgets something it cannot get it back. With one, forgetting becomes safe, and reloading becomes cheap, because it only ever pulls the slice it needs. That is the difference between an agent that degrades over a long run and one that stays sharp.
There is a second-order benefit. Because the work is typed as tasks with Done-when conditions, a miss becomes visible. An agent, a human, or a following session can walk the Done-when clauses, see what is unmet, and file the gap as its own task, instead of discovering it three weeks later. A transcript has no Done-when. It just ends.
It is also a cost story
Selection instead of dumping has a direct consequence: fewer tokens per run. Every time the agent receives an assembled briefing instead of reloading its history or pulling whole pages it does not need, that is tokens it did not spend. On one run the difference is real. Across a long-running agent, or many runs a day, it compounds.
Right now that mostly reads as tidiness, because inference is cheap. It is reasonable to expect it will not stay this cheap. A lot of today's pricing looks like a land grab, set below what serving tokens will cost once the market settles and the subsidies thin out. You do not have to commit to a specific prediction to take the safe side of the bet: fewer tokens for the same work is strictly better, and the gap matters more the more each token costs. A pipeline that is already lean on tokens is money saved later, not just a cleaner window now. The cheap time to build that habit is while tokens are cheap, not after the bill arrives.
What this looks like in practice
None of this asks much of you. You register Octopad as an MCP
server in openclaw.json, alongside the others.
Then the agent's instructions point it at the workspace: start
the session, find or create the task for the work, capture
decisions as it makes them, write outcomes back before it
finishes.
The agent does the rest, because reading and writing structured work is just more tool calls, and tool calls are the thing these agents are already good at. The shift is not in how the agent runs. It is in where the work lands when the run is over.
What stays
The way to tell whether any of this is working is to look at what is left after the agent is gone.
With only the session to show for it, what is left is a transcript someone might read once and a container that no longer exists. With Octopad underneath, what is left is the work itself: the tasks that got done, the decisions and the reasons behind them, the things that did not get finished and are now on the board, waiting. The run was the way the work got made. The record is the thing you keep.
The post we published earlier on MCP account switching ended on a version of this same point, almost in passing: the run is ephemeral, the artifacts written back to the workspace are the durable record. This is that point, made on purpose. An OpenClaw agent is a good worker. Give it somewhere for the work to go.