How Karpathy Built a Self-Improving AI Second Brain

Most of us use LLMs like a search engine with a chat box. Ask a question, get an answer, move on. Maybe paste it into a doc somewhere. Andrej Karpathy has been doing something different. He's been feeding raw information into a system where the LLM itself organizes, indexes, and maintains the knowledge. He barely touches the output directly.

He shared this on X recently and it caught my attention because it's the kind of setup that sounds obvious in hindsight but nobody really does. A large fraction of his recent token usage goes into "manipulating knowledge" rather than manipulating code. That shift is interesting.

How the Setup Actually Works

The tool stack is surprisingly boring. Obsidian for storage. That's basically it on the app side. The interesting part is the workflow.

Everything starts in a /raw folder. Articles, papers, repos, datasets, all clipped from the web using Obsidian's Web Clipper. Markdown and images. No fancy preprocessing, just dump it in.

Then the LLM takes over. It incrementally "compiles" a wiki from all that raw material. It writes summaries, creates visualizations, maintains index files, and organizes everything into a browsable knowledge base. Karpathy rarely edits any of it by hand.

The output isn't chat responses. It's new Markdown files, Marp slideshows, matplotlib charts, all rendered inside Obsidian. And here's the self-improving part: those outputs get filed back into the wiki. So the knowledge base grows and refines itself over time.

tl;dr on the setup

Raw articles go into a /raw folder via Obsidian Web Clipper
LLM compiles a wiki with summaries, visualizations, index files
Auto-maintained indexes replace RAG, the LLM reads its own summaries to find related docs
Outputs (markdown, slides, charts) get filed back into the wiki
Scale: ~100 articles, ~400K words across several topics

Why This Is a Better RAG

The indexing approach is the part that got me thinking. Instead of building a vector database, embedding chunks, setting up retrieval pipelines, and debugging why your RAG keeps returning garbage, Karpathy just has the LLM maintain its own index files. Brief summaries of every document. When it needs context on a topic, it reads the index, identifies relevant files, and reads those.

At ~100 articles this is totally feasible. You don't need vector search when your index fits in context. And the LLM wrote the summaries itself, so it knows what's in them. No embedding drift, no chunking artifacts, no retrieval pipeline to maintain.

Obviously this doesn't scale to millions of documents. But for a personal knowledge base? For a team wiki? It might be all you need. I've spent way too many hours debugging RAG setups that could have been replaced by a well-organized folder and an LLM that knows where things are.

The key insight: when agents maintain their own memory layer, they don't need massive context windows. They need clean file organization and the ability to query their own indexes. Simple and boring beats clever and fragile.

More stuff like this, twice a week

I dig through 110+ tech sources so you don't have to. AI workflows, developer tools, what's actually working in production, no fluff.

Early-adopter insights

•

Ship, don't just code

This Is Part of a Bigger Shift

The second brain thing isn't happening in isolation. Karpathy has been pushing hard on what he calls "agentic engineering" for a while now.

His AutoResearch project is a good example. He built autonomous AI agents that ran 700 experiments in 2 days on a single GPU. Found 20 optimizations that improved LLM training time by 11%. The whole thing runs on a single markdown prompt and about 630 lines of training code.

And it works for others too. Shopify CEO Tobias Lutke tested AutoResearch overnight on internal data and got a 19% performance improvement across 37 experiments. Overnight. While sleeping.

Karpathy now runs what he calls "tmux grids of agents." Multiple agents running in parallel, each working on different experiments or tasks. He says "the old single-file IDE is dead. The new unit is teams of agents."

Whether you agree with the framing or not, the pattern is clear. The workflow is shifting from "human writes code, AI suggests completions" to "human directs and orchestrates, agents execute." The second brain setup is the knowledge management version of the same idea.

What You Can Actually Do With This

You don't need to be Karpathy to try this. The setup is genuinely simple. Here's what a minimal version looks like:

1. Pick a topic you're actively learning about. Could be a new framework, a domain you're building in, whatever. Don't try to build a "second brain for everything" on day one.

2. Set up the raw folder pattern. Obsidian with Web Clipper works great. But honestly any markdown folder works. The tool matters less than the habit of dumping interesting things into one place.

3. Write a prompt that compiles your raw notes into something structured. Have the LLM create an index file, write summaries of each article, and organize them by subtopic. Tell it to update the index when new articles arrive.

4. Ask the LLM to generate outputs, not answers. Instead of "what does paper X say about Y," ask it to write a comparison document, or generate a slide deck, or create a visualization. Make it produce artifacts that go back into the system.

5. Let it compound. The whole point is that outputs feed back in. A summary the LLM wrote today becomes context it uses tomorrow. This is where the "self-improving" part kicks in.

I'm actually going to try this for my own research workflow. I read a lot of stuff for this newsletter and most of it disappears into browser tabs that never get reopened. Having an LLM maintain a structured version of everything I've read sounds genuinely useful.

The Obvious Caveats

A few things to keep in mind before you go all in on this:

Scale limits are real. 100 articles with auto-indexes works. 10,000 articles probably needs actual RAG or some hybrid approach. Know when to graduate from the simple version.
LLM summaries can drift. If the model misunderstands an article and writes a bad summary, that bad summary becomes part of the system's context going forward. Self-improving can also mean self-corrupting if you never check the outputs.
Token costs add up. Having an LLM reindex and rewrite summaries every time you add a few articles isn't free. At Karpathy's scale it's probably fine. If you're dumping hundreds of articles a day, think about incremental updates.
This isn't a replacement for understanding. Having a well-organized knowledge base is great. But if you never actually read the source material yourself, you're trusting the LLM's interpretation of everything. Use it as an augmentation layer, not a replacement for your own reading.

Why This Matters

The thing I keep coming back to is how simple this setup is. Obsidian, a raw folder, an LLM, and a prompt that says "organize this and maintain an index." No vector database. No embedding pipeline. No retrieval infrastructure. And it apparently works well enough for one of the sharpest people in AI to use it as his primary knowledge management system.

Between this, AutoResearch running 700 experiments autonomously, and the "tmux grids of agents" workflow, I think Karpathy is showing us what developer workflows look like in 12-18 months. The shift from "use AI to write code faster" to "direct teams of agents that manage knowledge and run experiments" is already happening. It's just not evenly distributed yet.

Credit to Charly Wargnier (@DataChaz) for the detailed breakdown of Karpathy's setup on X. Worth following if you're into this kind of thing.

P.S. If you try building your own version of this, I'd genuinely love to hear how it goes. The gap between "Karpathy does it" and "normal devs can replicate it" is usually where the interesting learnings happen.

techUpkeep()

Karpathy's Self-Improving Second Brain: LLMs That Maintain Their Own Knowledge Base