Skip to main content
DetGaao

Dream: an AI librarian for your notes (Part 1)

The more I work with AI day to day, the messier my own notes get. Dream is a small overnight process that helps consolidate my knowledge base without rewriting it. Part 1: the principle and the build. Part 2 in two weeks, with the data.

May 19, 20267 min readMichael Fellner

The more I work with AI day to day, the stranger my own notes get. Lessons I learned three months ago resurface as fresh discoveries in every new session. Old project files contradict newer ones, goal posts move, sometimes because I change my mind, or a client changes their mind. Sometimes simply because after pausing a project for a few weeks better concepts emerge. I sit on a folder of decisions from January and rediscover the same conclusion in March, because nothing connected the two. Being human comes with limitations. For me, one of those is that remembering the day of the week can sometimes be challenging. The AI is getting better at helping me. My knowledge base is getting worse at reflecting the current reality.

I built something this month to test a different approach. I called it Dream. The idea is not original. Anthropic announced their version, also called Dreaming, two weeks before I started building mine. What I think is worth writing about is not the build itself, but a reframe of what the mechanism is for. This is Part 1: the principle and the architecture. Part 2, in two weeks, will show what actually happened: did I waste my time and energy or does this approach have value.

Facts change, AI does not notice

Three ways things can go sideways, all familiar to anyone running their own knowledge base alongside daily AI use.

The first is volume. Notes pile up faster than I can review them. Every session generates ideas, decisions, fragments. Most get saved. Few get read again. Which really is pretty much exactly the same thing that happens with my handwritten notes, notebooks, PostIt's ...

The second is staleness. A decision I made in January about how to position DETGAAO is still in my files in May, even though I made a sharper version of the same decision in March. Both files exist. Neither knows about the other.

The third is repetition. Lessons that took me weeks to learn vanish between sessions. The next time I open a fresh project, I work them out again. The AI is doing fine. It is the input that is broken, not the model. But the input is mine, and I am not actively enough maintaining it.

So I have a folder of "decisions" that is slowly becoming a folder of "decisions that may or may not still be true." Nobody is checking. Nobody is reading the older files alongside the newer ones. The consolidation step does not exist. Again, similar to handwritten notes, where you may attempt to consolidate, spending forever sifting through notebooks and loose papers trying to figure why and when you wrote things down.

Anthropic's Dreaming: agents stop forgetting

On May 6, at Code with Claude in San Francisco, Anthropic announced a feature called Dreaming. Mahesh Murag, a product manager on the platform team, walked through the design. The mechanism is straightforward. A scheduled background job reads an agent's past sessions and existing memory store. It looks for patterns, recurring mistakes, and useful workflows. It writes a new, reorganized memory store: duplicates merged, contradictions resolved, new insights surfaced. Mahesh's presentation inspired me.

The key safety rule, in Anthropic's own framing: the input store is never modified. The output is reviewable. You can throw it away if you do not like the result.

Early numbers from beta customers were significant. Harvey, the legal AI startup, saw roughly six times the task completion rate on a class of legal-drafting problems once Dreaming was turned on. Rakuten, on the related Memory feature, reported a 97% reduction in first-pass errors. These are vendor-internal results and should be read as directional, not gospel. I am generally a skeptic. That said, they are substantial enough to take seriously, even by me.

The problem Anthropic is solving is real: AI agents forget what worked across sessions, so the same tasks fail in the same ways every time. Dreaming is conceptualized to fix that.

Same mechanism, different target

Watching the talk, the part that caught me was that the same mechanism, pointed at a different problem, may be useful to me personally. The thing I want consolidated is not an AI agent's memory. It is my own knowledge base. The dev-brain, the project notes, the handovers, the decisions log. The places I go to think.

The rule change is small but it matters. Anthropic's Dreaming rewrites the agent's memory store, and the agent uses the new version next time. For a human knowledge base, that is the wrong shape. If you let an AI rewrite your notes, you lose authorship of your own thinking. Even when the AI gets it right, the file is now written in a voice that is not yours. Even when it gets it wrong, the wrong version is now the source of truth.

So the principle I started from: AI should be a librarian, not a co-author.

A librarian organizes, indexes, flags duplicates, points out where something might fit better. A co-author rewrites things in their own voice. The librarian leaves the books alone. The co-author opens them up and changes the words.

For my own knowledge base, I want the librarian.

What I built (three layers and a gate)

Dream is the practical version of that principle. It runs as one scheduled Skill, with three outputs and a strict read-only rule on my actual files.

Always-on lessons. A small shared file of recurring lessons. A pattern has to appear in at least three separate places before it earns a spot. Three is an assumption I had to make. Is it enough? Too strict? Too loose? The file loads into every future agent session, so the same lesson does not need to be re-learned. It is tiny by design. It is not my notes. It is a separate scratchpad.

Drift reports. Per-project reports of what has gone stale, what contradicts what, what is no longer accurate. They sit in a folder. I read them when I want to. Nothing auto-loads into my working memory.

Gated fixes. Suggested factual corrections wait in a queue. Nothing touches a real file until I approve each one, individually. The queue is short and concrete: here is the file, here is the line, here is what Dream thinks should change, here is why. I approve, edit, or reject.

The gating, which is the part that took the most care:

  • Read-only on my notes. My files and the memory index are never written to by Dream. Two independent code reviews and an audit confirmed it.
  • A cheap calculator decides whether tonight is worth running, not the AI. Quiet nights cost essentially nothing. Not creating a process that destroys my usage and token budget was a key consideration in conceptualizing this 'tool'.
  • Single save-point. If anything fails mid-run, the system rolls back. As if it never ran.
  • 50 automated checks. Independent audit. No safety issues found.

The nightly loop is small: check what has changed across projects, dream only on the ones that moved, write a report and update the lessons file, never touch the notes. That is it.

It is in supervised trial right now. I read every output. Eventually I want it portable so it can sit on top of any knowledge setup (Obsidian, Notion, raw markdown folders). For now, it runs on mine.

What Part 2 will answer

This post is the principle and the architecture. The empirical post lands in two weeks, around the start of June, then we will know if I lost my mind and am suffering from AI psychosis or if I found a valid way to improve the project and development process. I'll take bets for the next few days.

A few questions I will be trying to answer:

  • Acceptance rate on gated fixes. How often did Dream get a factual correction right? How often did I overrule it? The ratio is the trust signal.
  • Surprise catches. Anything Dream flagged that I would not have found on my own. (Which shouldn't be too hard)
  • False positives. Where Dream confidently suggested things that were wrong. I fully expect to get those. I am curious if some are going to be wrong in instructive ways.
  • Where the design held up. Where it broke.

I will be honest about it. If the answer is "this is more trouble than it is worth and I am going to turn it off," I am seeing that as just as useful of a finding as the opposite. If the answer is "this is genuinely changing how I work," that is the post I obviously would prefer to write. Both are real outcomes. The piece in two weeks writes itself either way.

If you are thinking about something similar for your own setup, I would rather you wait until Part 2 to decide. Right now, the idea or principle is more interesting than the build. The data will tell us whether the principle is worth running with.

Want help thinking through what this changes for your marketing operations?

Start a conversation