Instead of making the LLM rediscover knowledge from raw documents on every query — the RAG way — Karpathy proposes having the LLM compile a structured, interlinked wiki once at ingest time. Knowledge accumulates. The LLM maintains the wiki, not the human.
↓ Tap any row to expand analysis
| Dimension | memex / LLM Wiki | RAG |
|---|---|---|
| Knowledge Accumulation | ✦ Compounds with each ingest | Stateless — restarts every query |
| Maintenance Cost | ✦ LLM does the filing | Chunking pipelines need upkeep |
| Scale Ceiling | ~50–100K tokens hard limit | ✦ Millions of documents, no ceiling |
| Human Readability | ✦ Plain markdown, fully auditable | Black-box vector space |
| Semantic Retrieval | Explicit links only | ✦ Fuzzy semantic matching |
| Error Persistence | Errors compound into future pages | Errors are ephemeral per query |
| Multi-user / RBAC | None — flat file system | ✦ Supported by most platforms |
| Query Latency | ✦ Fast at personal scale | Embedding search overhead |
| Setup Complexity | ✦ Just folders & markdown | Vector DB, chunking, embeddings |
| Vendor Lock-in | ✦ Zero — any model, any editor | Often tied to embedding provider |
| Cross-reference Quality | ✦ Rich, named wikilinks | Implicit via similarity score |
| Fine-tuning Pathway | ✦ Wiki becomes training data | Raw chunks are poor training data |
Reading papers, articles, and reports over weeks or months on a single topic. Karpathy's primary use case — his ML research wiki has ~100 articles and 400K words, all compiled without writing a line manually.
Goals, health tracking, journal entries, podcast notes — building a structured picture of yourself over time. The LLM creates concept pages for recurring themes and connects them across months or years.
Engineering team internal docs, competitive analysis, trip planning. Works well if one person owns ingestion and the team reads via Obsidian. Breaks at concurrent writes or RBAC requirements.
AI agent systems that need persistent memory between sessions. The wiki prevents agents from "waking up blank." Session context is compiled rather than re-derived, dramatically cutting token overhead.
API parameter specs, version constraints, legal records, medical protocols. LLM-generated pages can silently misstate critical details. Manual cross-checking eliminates the maintenance savings that make this pattern attractive.
Millions of documents, hundreds of users, RBAC, audit trails, regulatory compliance. The flat file architecture cannot address concurrency, access control, or governance. This is a personal productivity hack, not enterprise infrastructure.
A breakdown of where the pattern generates real signal vs. where the noise grows louder.
Moving synthesis from query-time (RAG) to ingest-time (wiki) is a genuinely novel architectural choice with real benefits for accumulation. This is the core innovation and it holds up to scrutiny.
Offloading the maintenance bottleneck — the work that kills all human-maintained wikis — to an LLM is elegant and correct. The pattern solves a real problem people actually have.
Community hyperbole. RAG and the wiki pattern solve different problems at different scales. The wiki pattern is a personal productivity tool, not a replacement for enterprise-grade retrieval infrastructure.
Real and underweighted by enthusiasts. The persistent-error problem is structural — not a bug to fix with better prompting. It's a genuine trade-off the pattern makes, and it's most dangerous in precision-critical domains.
Karpathy's framing of sharing an "idea file" vs. a code repo — letting each person's agent instantiate a custom version — is genuinely forward-thinking about how patterns propagate in the agent era.
Karpathy explicitly scoped this to individual researchers. The limitations (no RBAC, no concurrency, ~50K token ceiling) are not bugs — they are consequences of the design assumptions. Enterprise use requires entirely different infrastructure.
The index.md breaks around 100–150 articles when it stops fitting cleanly in context. The community-endorsed fix is qmd — built by Tobi Lütke (Shopify CEO) and explicitly recommended by Karpathy himself. It's a local, on-device search engine for markdown files using hybrid BM25 + vector search with LLM re-ranking. No API calls, no data leaves your machine.
Install and integrate:
npm install -g @tobilu/qmd qmd collection add ./wiki --name my-research qmd mcp
The qmd mcp command exposes it as an MCP server so Claude Code uses it as a native tool — no shell-out friction. Three search modes: keyword BM25 (qmd search), semantic vector (qmd vsearch), and hybrid re-ranked (qmd query). Use the JSON output flag to pipe results into agent workflows.
Before reaching for qmd, a simpler scaling step is to split index.md into domain-specific sub-indexes: wiki/ml-theory/index.md, wiki/infrastructure/index.md, etc. A root index.md points to sub-indexes, keeping any single file within comfortable context window bounds.
Define this in your schema file (CLAUDE.md) so the LLM knows which sub-index to update on ingest and which to consult on query. The LLM reads only the relevant sub-index, not the full corpus.
From the LLM Wiki v2 community extension: structure knowledge in tiers by confidence and stability. Raw observations live in low-confidence pages. After multi-source confirmation, the LLM promotes them to "established" pages. Core principles graduate to a high-confidence tier that rarely changes.
Each tier is more compressed, more confident, and longer-lived than the one below it. The LLM only loads lower tiers when deeper detail is needed. This naturally keeps context window usage lean as the wiki grows — you're querying the compressed tier first, the full tier only on demand.
The flat-file architecture has no access control by default. The cleanest mitigation is to expose the wiki through an MCP server rather than as raw files. The open-source llmwiki project (lucasastorian/llmwiki) does exactly this: it wraps the Karpathy pattern with a FastAPI backend, Supabase auth, and MCP endpoints. Claude connects via MCP and has read/write tools — but only through the authenticated layer.
For self-hosted setups: build a minimal FastAPI wrapper that authenticates via JWT before allowing MCP tool calls. The markdown files stay on disk; the API layer enforces who can read and write. This pattern is already used in production implementations like Hjarni.
For small teams, a simpler pattern than full RBAC: separate wiki/shared/ from wiki/private/ directories, with git branch-level access control. The MCP server only exposes the shared/ tree to team members; personal pages stay in private/ on a branch only you merge from.
The LLM Wiki v2 pattern calls this "mesh sync with shared/private scoping." The schema file defines what can be promoted from private to shared and the conditions for that promotion.
The LLM Wiki v2 pattern solves persistent errors by making uncertainty explicit. Every factual claim in a wiki page carries metadata: how many sources support it, when it was last confirmed, and a confidence score (e.g., 0.85). Confidence decays with time and strengthens with reinforcement from new sources.
Implement this in YAML frontmatter on each page:
confidence: 0.85 sources: 2 last_confirmed: 2026-04-01
The lint pass checks for pages with decayed confidence scores and flags them for re-verification. The LLM can say "I'm fairly sure about X but less sure about Y" — it's no longer a flat collection of equally-weighted claims.
When new information contradicts an existing wiki claim, the wrong pattern is leaving the old claim with an appended note. The right pattern: the new claim explicitly supersedes the old one. The old version is preserved but marked stale with a timestamp and link to what replaced it — version control for knowledge, not just for files.
Define supersession in your schema: the LLM's ingest instructions should check for contradictions against existing pages before writing, and when found, issue a formal supersession record rather than a quiet edit.
Community implementation ELF (Eli's Lab Framework) uses a strict typed-entity system where every page is declared as a type (library, project, person, concept, decision) and every link between pages has a typed relationship (uses, depends-on, contradicts, caused, fixed, supersedes). This prevents the LLM from creating duplicate concept pages under different names.
A 5-step incremental ingest pass: diff → summarize → extract → write → image. The extract step enforces entity typing before the write step creates any new page — if a typed entity already exists, it merges rather than duplicates.
Community analysis of 120+ comments on Karpathy's gist converged on a clear finding: most people who try this pattern get the folder structure right and still end up with a wiki that slowly becomes unreliable, redundant, or abandoned. The difference between a wiki that compounds and one that quietly rots comes down to operational discipline — not technical setup.
The instinct is to have the agent write directly into the wiki. This creates the rot. The principle: your curated/verified vault and the agent's working vault (speculative writes, messy drafts, exploratory connections still being tested) must be physically separate directories. Only the human promotes content from agent-working to vault.
Structure: wiki/verified/ (human-promoted, high trust) vs wiki/staging/ (agent writes here first). The lint pass reviews staging and proposes promotions. You approve them. The signal-to-noise ratio in your verified wiki stays high permanently.
The number one reason wikis rot: the human stops ingesting because life gets busy. The fix is removing the human from the trigger loop. Set up a cron job or a filesystem watcher on raw/ that automatically triggers the ingest command whenever a new file lands. The human's job shrinks to: drop file, walk away.
Implementations: inotifywait on Linux, fswatch on macOS, or a Node.js chokidar watcher. On drop, the watcher calls your ingest script which runs the LLM compilation pass. You get a notification when it completes.
Lint passes don't happen if you have to remember to run them. The solution is automating them on a schedule — a weekly cron job that runs the lint command, writes a report to a lint-reports/ directory, and sends you a summary notification. The report tells you: N orphan pages found, N contradictions flagged, N pages with decayed confidence.
You review the report (5 minutes), decide which flagged items to address, and optionally run the LLM to resolve them. The system is telling you what needs attention rather than you having to inspect everything.
A community-evolved enhancement to Karpathy's original: add an identity-aware filter to your schema. A prompt section that tells the LLM exactly who the wiki is for, what their goals are, and what "high-signal" means in that context. The LLM then scores sources before ingesting and rewrites that filter over time based on what has proven useful.
This prevents the wiki from becoming a neutral encyclopedia of everything you've read. It stays opinionated, relevant, and tuned to your actual work. Over months, the schema itself becomes a reflection of what you find worth knowing — a second-order artifact of the system.
Not everything should live forever. A wiki that never forgets becomes noisy — important signals buried under outdated context. Implement a retention curve: facts that were important once but haven't been accessed or reinforced in months gradually fade to "archived" status. The lint pass executes this curve automatically.
Frontmatter fields to add: last_accessed, access_count, status: active|fading|archived. The lint pass updates status based on time-since-access and reinforcement count. Archived pages aren't deleted — they move to wiki/archive/ where they're out of the active index but still traceable.
Standard Karpathy wiki is fed by sources you manually drop into raw/. Your setup replaces that bottleneck with an automated conversation pipeline: every AI session gets mined into MemPalace, summarized, and fed into raw/ on a continuous basis. The wiki stops being a project you maintain and becomes an organism that grows from your daily work. Combined with qmd replacing ChromaDB for indexing, you have a genuinely novel hybrid that addresses the core limitations differently than any single pattern alone.
Note: You are skipping MemPalace's ChromaDB storage layer and using qmd for indexing instead. The implications of that choice are documented throughout this tab.
Conversation mining + auto-save hooks make the feed automatic. You no longer have to remember to drop files into raw/. Every Claude Code session is mined. The PreCompact hook fires before context compression. The Stop hook fires every 15 messages.
Drawers preserve verbatim originals permanently. When a wiki claim is flagged as low-confidence, you have an exact traceable source to verify against — not just "raw/source-2026-04.md" but a wing-scoped, room-tagged original with a drawer ID.
MemPalace's wing+room metadata filtering means qmd doesn't have to search the entire corpus — it searches a pre-narrowed wing/room scope first. This extends the effective scale ceiling because retrieval is structurally guided before the BM25+vector pass fires.
Conversations are the primary source — they're inherently current. Every session you have becomes a potential ingest. Staleness now depends on how actively you use AI tools (which you do constantly), not on whether you remember to read and clip articles.
The combination of MemPalace structural navigation (wing → room → closet → drawer) plus qmd's BM25+vector search covers both explicit structural navigation and fuzzy semantic matching. You have the best of both retrieval patterns without a full vector database.
Not every conversation deserves to enter the wiki. Debugging rabbit holes, exploratory dead-ends, and casual exchanges are valuable in MemPalace's verbatim drawers but would pollute the wiki if compiled directly. The summarization/filtering step before raw/ is now load-bearing.
| Dimension | qmd (your choice) | ChromaDB (MemPalace default) |
|---|---|---|
| Storage format | Markdown files (same as wiki) | ✦ Proprietary vector DB |
| Semantic recall (LongMemEval) | Not benchmarked on this task | ✦ 96.6% R@5 raw mode |
| Wiki integration | ✦ Native — indexes wiki/ directly | Separate store, no wiki awareness |
| Single index to maintain | ✦ Yes — one qmd collection | No — wiki + ChromaDB separate |
| MCP exposure | ✦ qmd mcp — native tool for Claude | Via MemPalace MCP server |
| Hybrid search (BM25 + vector) | ✦ Built in — qmd query | ChromaDB semantic only |
| Dependencies | ✦ npm only, local GGUF model | Python, chromadb, potential version pin issues |
| Verbatim drawer retrieval | Not designed for this | ✦ Core feature — drawers are ChromaDB entries |
| Architectural simplicity | ✦ One search layer for everything | Two parallel search systems |
| Limitation | Before MemPalace | With MemPalace + qmd | Residual Work |
|---|---|---|---|
| Active Upkeep | Manual — wikis rot | ✦ Auto-hooks feed continuously | Summarization quality tuning |
| Error Persistence | No traceable ground truth | ✦ Drawers = verbatim source | Confidence scoring in schema |
| Scale Ceiling | ~50–100K token hard limit | Extended by wing+room filtering | qmd still needed at 200+ articles |
| Semantic Retrieval Gap | Explicit links only | ✦ Structure + qmd BM25+vector | Some ChromaDB recall lost (see above) |
| Knowledge Staleness | Depends on manual curation | ✦ Continuous from session mining | Retention curve still needed |
| Cross-check | Raw docs only, imprecise | ✦ Drawer-level verbatim traceability | fact_checker.py not yet wired (v3) |
| Access Control | Flat file, none | Still needs MCP wrapper layer | Tailscale boundary is your fastest path |
| Cognitive Outsourcing | Valid concern | Unchanged — wiki is still reference only | Design intent: reference, not replacement |
In the original pattern, you curated sources manually — only deliberate, quality inputs entered raw/. With conversation mining, the filter is your summarization scripts. If those scripts surface debugging dead-ends, exploratory rabbit holes, or noise, it enters the wiki compilation pipeline. Garbage-in still applies — it's just at a different point in the flow.
Mitigation: Tune your conversation scripts to filter by memory type (hall_facts and hall_discoveries are high-signal; hall_events is medium; raw session transcripts are low). Only promote closet summaries tagged as decisions, discoveries, or recommendations. Use MemPalace's --extract general mode to auto-classify before staging.
MemPalace's contradiction detection (fact_checker.py) exists as a standalone utility but is not currently called automatically during knowledge graph operations — the authors acknowledged this in their April 7 correction note. This means cross-wing contradictions won't be auto-flagged at ingest time yet.
Mitigation: Call fact_checker.py manually as part of your lint pass script until Issue #27 is resolved. Wire it as a pre-commit hook on wiki/ changes: any new page goes through fact_checker before being promoted from staging to verified.
MemPalace's taxonomy (wings, rooms, halls) and the wiki's taxonomy (domains, concept pages, page types in CLAUDE.md) are separate schemas. If they drift — MemPalace calls something "wing_taskforge/hall_facts/auth" while the wiki calls it "infrastructure/auth-decisions" — the structural navigation loses coherence. Tunnels and wikilinks stop reinforcing each other.
Mitigation: Define a canonical mapping document (a simple markdown table) that maps MemPalace wing/room names to wiki domain/page paths. Reference it in both CLAUDE.md and your MemPalace wing_config.json. Review quarterly — schemas co-evolve, but they need to co-evolve together.
The first seven extensions came out of the Signal & Noise review. The eighth surfaced only after the other layers were built — and it's the one that makes the MemPalace integration a real pipeline into the wiki instead of just a searchable archive beside it. The mining layer was extracting sessions, classifying bullets into halls, tagging topics, and making everything searchable via qmd. But the knowledge inside the conversations was never being compiled into wiki pages. A decision made in a session, a root cause found during debugging, a pattern spotted in review — these stayed in the conversation summaries forever, findable but not synthesized.
This is what the wiki-distill.py script solves. It's Phase 1a of wiki-maintain.sh and runs before URL harvesting because conversation content should drive the page, not the URLs the conversation cites.
| Hall | Distilled? | Why |
|---|---|---|
| hall_facts | ✦ YES | Decisions locked in, choices made, specs agreed. Canonical knowledge. |
| hall_discoveries | ✦ YES | Root causes, breakthroughs, non-obvious findings. The highest-signal content in any session. |
| hall_advice | ✦ YES | Recommendations, lessons learned, "next time do X." Worth capturing as patterns. |
| hall_events | no | Deployments, incidents, milestones. Temporal data — belongs in logs, not the wiki. |
| hall_preferences | no | User working style notes. Belong in personal configs, not the shared wiki. |
| hall_tooling | no | Script/command usage, failures, improvements. Usually low-signal or duplicates what's already in the wiki. |
Each daily run only looks at conversations dated today. It extracts the topics: frontmatter from each — that union becomes the "topics of today" set. If you didn't discuss a topic today, it's not in the processing scope. This keeps the cron job cheap and predictable: if today was a light session day, distill runs fast. If today was a heavy architecture discussion, distill does real work.
Once the today-topic set is known, for each topic the script walks the entire conversation archive and pulls every summarized conversation that shares that topic. A discussion about blue-green-deploy today might roll up 16 conversations across the last 6 months. The claude -p call sees the full history, not just today's fragment.
This is what makes the distilled pages good. The LLM isn't guessing what a pattern looks like from one session — it's synthesizing across everything you've ever discussed on the topic.
The narrow-today/wide-history combination produces a useful emergent property: dormant topics wake up automatically. If you discussed database-migrations three months ago and it never came up again, it's not in the daily scope. But the day you mention it again in any new conversation, that topic enters today's set — and the rollup pulls in all three months of historical discussion. The wiki page gets updated with fresh synthesis across the full history without you having to manually trigger reprocessing.
A conversation is considered "already distilled" only if its body hash AND its topic set match what was seen at the last distill. If the body changes (summarizer re-ran and updated the bullets) OR a new topic is added, the conversation gets re-processed on the next run. Topics get tracked so rejected ones don't get reprocessed forever — if the LLM says "this topic doesn't deserve a wiki page" once, it stays rejected until something meaningful changes.
The orchestrator runs distill as Phase 1a and harvest as Phase 1b. Deliberate: if a topic is being actively discussed in your sessions, you want the wiki page to reflect your synthesis of what you've learned, not just the external URL cited in passing. URL harvesting then fills in gaps — it picks up the docs pages, blog posts, and references that your sessions didn't already cover.
patterns/docker-hardening.md and harvest creates patterns/docker-hardening.md, the staging-unique-path helper appends a short hash suffix so they don't collide. The reviewer sees both in staging and picks the better one (usually distill, since it has historical context).Every distilled page lands in staging with full provenance in its frontmatter. When you review a page in staging, you can see exactly which conversations it came from and jump directly to those transcripts.
--- origin: automated status: pending staged_date: 2026-04-12 staged_by: wiki-distill target_path: patterns/zoho-crm-integration.md distill_topic: zoho-api distill_source_conversations: conversations/general/2026-04-06-73d15650.md,conversations/mc/2026-03-30-64089d1d.md compilation_notes: Two separate incidents discovered the same Zoho CRM v2 API limitations, documenting them as a pattern page prevents re-investigation and provides a canonical reference for future Zoho integrations. title: Zoho CRM Integration type: pattern confidence: high sources: [conversations/general/2026-04-06-73d15650.md, conversations/mc/2026-03-30-64089d1d.md] related: [database-migrations.md, activity-event-auditing.md] last_compiled: 2026-04-12 last_verified: 2026-04-12 ---