DEV Community

Cover image for What Karpathy's LLM Wiki Is Missing (And How to Fix It)
Penfield
Penfield

Posted on

What Karpathy's LLM Wiki Is Missing (And How to Fix It)

Typed links and scaling limits

Andrej Karpathy's LLM Wiki pattern went viral this month. 5,000+ stars, 3,700 forks, dozens of implementations. The core insight is right: stop re-deriving knowledge on every query. Compile it once into a structured wiki. Let the LLM do the bookkeeping that makes humans abandon knowledge bases.

If you haven't read it, the pattern is: raw sources go into a directory, an LLM processes them into interlinked markdown pages, and Obsidian serves as the viewer. Three layers, three operations (ingest, query, lint), and the LLM maintains everything.

It's a good starting point. But if you've tried to run this pattern beyond a few hundred notes, you've likely already hit the wall. There are three structural gaps that break down at scale, and they aren't things you can fix with a better prompt or a fancier index file.

Here's what's missing and how to fix it.

Gap 1: Your links don't mean anything

Open Obsidian's graph view on a Karpathy-style wiki. What do you see? A web of identical gray lines. Every connection looks the same because every [[wikilink]] carries exactly one bit of information: "these two notes are connected."

Obsidian Graph View

That's not enough.

When Karpathy talks about the LLM "noting where new data contradicts old claims" and "flagging contradictions," he's describing semantic relationships. But the underlying link format can't express any of them. [[Note A]] doesn't tell you whether Note A supports, contradicts, supersedes, or was caused by the current note. The meaning lives in the prose around the link, invisible to every tool in the Obsidian ecosystem.

This matters because the whole point of a compiled wiki is that the structure does work for you. If your graph can't distinguish "this supersedes that" from "this contradicts that," you're leaving some of the most valuable information trapped in unstructured text, which is exactly the problem you were trying to solve.

The fix: typed relationships inside wikilinks

obsidian-wikilink-types adds semantic relationship types to standard Obsidian wikilinks using @ syntax:

[[Previous Analysis|The new research @supersedes the previous analysis]]
[[Redis Paper|This @supports the caching architecture in @references the Redis paper]]
Enter fullscreen mode Exit fullscreen mode

Type @ inside a wikilink alias and you get an autocomplete dropdown of 24 relationship types: supersedes, contradicts, causes, supports, evolution_of, prerequisite_for, and more.

obsidian-wikilink-types

On save, the plugin syncs matched types to YAML frontmatter automatically:

---
supersedes:
  - "[[Previous Analysis]]"
supports:
  - "[[Redis Paper]]"
references:
  - "[[Redis Paper]]"
---
Enter fullscreen mode Exit fullscreen mode

That's it. Standard YAML frontmatter. Dataview can query it. Nothing breaks.

The @ syntax was deliberately chosen: it doesn't conflict with any existing Obsidian syntax (^ is block references, :: is Dataview inline fields), and it triggers autocomplete only when preceded by a space or appearing right after the | pipe. john@example.com in your display text is left alone. Only configured relationship types generate frontmatter. @monkeyballs is just display text.

Install it via BRAT with penfieldlabs/obsidian-wikilink-types.

What this changes

With typed links, your vault goes from a tangle of identical connections to a queryable knowledge graph. You can write Dataview queries like "show me everything that contradicts my current hypothesis." You can trace causation chains. You can see at a glance which notes have been superseded and which are current.

This is what Karpathy's pattern needs but doesn't have: links that carry meaning.

Gap 2: You shouldn't have to type every relationship yourself

A wiki with typed links is more useful than one without. But manually typing @supersedes and @contradicts on every note is tedious, and you'll miss connections that aren't obvious.

The whole premise of the LLM Wiki is that the LLM does the bookkeeping. So let it discover the relationships too.

The fix: AI-discovered typed relationships

The Vault Linker skill ships in the same repo as the plugin. It's a skill specification for AI agents (Claude Code, OpenClaw, or anything that can read and write files) that analyzes your vault and discovers relationships between notes.

The workflow:

  1. Point your AI agent at your vault with the Vault Linker skill loaded
  2. The agent reads your notes and identifies connections: "This note supersedes that one. This note contradicts that claim. This was caused by that decision."
  3. The agent writes the relationships in Wikilink Types format: adding @supersedes, @contradicts, etc. to the wikilinks and syncing the frontmatter
  4. You review and approve

The human stays in the loop for judgment. The AI does the grunt work of reading hundreds of notes and spotting connections you'd never find manually.

The LLM Wiki pattern says the LLM should do all the "summarizing, cross-referencing, filing, and bookkeeping." Typed links give the LLM a vocabulary for those cross-references. The Vault Linker skill gives it the workflow to actually do it.

Autonomous mode: link an entire vault overnight

The skill above is interactive: the agent discovers, you approve. But what if you have 500 notes and want to link the whole thing in one pass?

The repo includes two prompts designed to work as a pipeline:

Autonomous Vault Linking is the build phase. You give it to your agent with a vault path and walk away. The agent creates a git branch, surveys the vault, classifies notes as hubs or spokes, then works through them in priority order: hub-to-hub relationships first (the highest-value connections), then spoke-to-hub (the bulk of the work), then lateral spoke-to-spoke connections. It commits every 20-50 notes, writes a linking log with stats and confidence levels, and never touches your main branch. If you're running multiple agents in parallel (one per folder, say), the prompt includes coordination rules: each agent only writes to its assigned notes, verifies target files exist before linking, and logs anything it had to skip.

Verify and Repair is the cleanup phase. You run it on the same branch after the build completes. It builds a complete file index, scans every note for broken links (correctly excluding code blocks and callouts), repairs what it can (near-match resolution, parallel-agent artifact removal), checks that frontmatter and inline @type links are consistent, removes duplicates, classifies orphan notes, and validates all YAML. The output is a verification report telling you exactly what was fixed and what still needs human judgment. Only after verify passes do you merge.

The two-phase design is deliberate: the build phase is optimized for throughput, the verify phase is optimized for correctness. Both are idempotent. Re-running on an already-linked vault produces zero changes.

Gap 3: Your knowledge is trapped on one machine

This is the gap most implementations aren't solving.

The LLM Wiki stores everything as plain markdown. You can sync those files with git, point multiple tools at the same directory, access them from anywhere. The files aren't the problem.

The agent's understanding is.

Every time you start a new session, the LLM reads your index file, re-parses the wiki structure, and rediscovers what it already knew last session. There's no persistent graph in memory. No way to query "what contradicts my hypothesis about X?" without the LLM re-reading every relevant page. No graph traversal that can walk typed relationships across hundreds of notes. The index.md catalog works at small scale, but it's a flat file, not a query engine.

Git gives you file portability. What it doesn't give you is agent-level memory, relationship-aware search, or a persistent knowledge graph that any tool can query without re-parsing everything from scratch.

The fix: a persistent knowledge graph backend

Penfield is a persistent memory and knowledge graph system for AI agents. It stores memories, artifacts, and typed relationships in a backend accessible via MCP (Model Context Protocol) from any compatible client.

The relevant capabilities:

  • Hybrid search: BM25 (keyword) + vector (semantic) + graph traversal, fused together. Not "pick one." All three, weighted and merged.
  • Typed relationships: The same 24 relationship types from wikilink-types are native to Penfield's graph. supersedes, contradicts, causes, all of them. The vocabulary matches exactly.
  • Cross-platform access: Connect from Claude Code, Claude.ai, OpenClaw, Cursor, Gemini CLI, or anything else that speaks MCP. Same knowledge graph, same relationships, regardless of which tool you're using.
  • Persistence across sessions: The graph doesn't disappear when you close a tab. Memories, relationships, and artifacts survive indefinitely. Start a new session and pick up where you left off.

The pipeline: Obsidian to Penfield

penfield-import is the bridge. It reads an Obsidian vault (or any collection of markdown files) and imports everything into Penfield as memories, relationships, and artifacts.

The tool runs in seven phases with crash-safe checkpointing:

  1. Parse: Reads all .md and .txt files, extracts YAML frontmatter and typed relationships
  2. Memories: Creates one Penfield memory per note
  3. Artifacts: Uploads full content for notes exceeding the 10K character memory limit
  4. Exported Artifacts: Uploads pre-existing artifact files
  5. Documents: Uploads documents (PDFs, code files, etc.)
  6. Relationships: Bulk-creates relationships between memories in batches of 100
  7. Verify: Confirms import counts match

Quick start:

# Install
pip install .

# Authenticate (opens browser, takes 2 seconds)
penfield-import --login

# Preview what will be imported
penfield-import /path/to/your/vault --dry-run

# Run the import
penfield-import /path/to/your/vault
Enter fullscreen mode Exit fullscreen mode

If your vault has typed relationships from obsidian-wikilink-types, they come through as graph edges in Penfield. If it doesn't, you still get all your notes as searchable memories. Typed links make the import richer, but they aren't required.

We've run this at scale with over 4,000 notes and over 20,000 relationships imported in a single autonomous run. The checkpoint system means if something crashes at phase 5, it resumes from phase 5, not from scratch.

The complete pipeline

Here's what the full workflow looks like, whether you're upgrading an existing vault or starting fresh:

Complete Pipeline

Path A: You already have an Obsidian vault

  1. Install obsidian-wikilink-types in your vault
  2. Run the Vault Linker skill with Claude Code or OpenClaw to discover relationships across your existing notes
  3. Review and approve the AI-suggested relationships
  4. Run penfield-import to push everything into Penfield
  5. Access your knowledge from any MCP-compatible AI tool, on any device

Path B: Starting fresh with the LLM Wiki pattern

  1. Follow Karpathy's pattern: collect sources, have the LLM compile a wiki
  2. But use obsidian-wikilink-types from day one. When the LLM creates cross-references, have it use @ syntax so the relationships are typed from the start
  3. Periodically run the Vault Linker skill to catch relationships the LLM missed
  4. When your wiki is rich enough, import to Penfield for persistent, cross-platform access

What you get vs. what you had

Karpathy's LLM Wiki With typed links + Penfield
Link semantics [[Note]] - connected, no type [[Note @supersedes]] - 24 relationship types
Search index.md flat file, breaks at scale Hybrid: BM25 + vector + graph traversal
Persistence None - LLM forgets between sessions Full - knowledge graph persists indefinitely
Device access One laptop, one directory Any device, any MCP or API client
Agent compatibility One agent at a time Claude, OpenClaw, Cursor, Gemini CLI, etc.
Relationship discovery Manual, in prose AI-discovered via Vault Linker, human approval

The tools

Everything mentioned in this article is available now:

Karpathy's LLM Wiki pattern is a solid foundation. Typed relationships, AI-discovered connections, and a persistent backend are what turn it from a clever note-taking hack into a knowledge system that actually compounds.


If you have questions or want to contribute, open an issue on any of the repos above or find us at @penfieldlabs.

Top comments (19)

Collapse
 
deadbyapril profile image
Survivor Forge

All three gaps map to problems I've hit in production running an autonomous agent across 1,100+ sessions with a Neo4j-backed knowledge graph.

On typed relationships — I went through exactly this progression. Started with flat markdown files. Links were implicit ('this file mentions that file'). After ~500 sessions, retrieval started returning noise because 'related' could mean anything. The fix was typed predicates on graph edges: supersedes, blocked_by, attempted_and_failed, reactivates_when. The type isn't just metadata — it changes how the agent reasons about the connection. 'Supersedes' means ignore the old node. 'Attempted and failed' means don't retry without new information.

On automated relationship discovery — this is the hardest part. I built a tiered search (recent + reference for fast queries, full archive for deep dives) but the discovery of new relationships still fires most reliably as a side effect of doing real work, not from a dedicated linking pass. When the agent solves a problem, it naturally surfaces which prior knowledge was wrong or outdated. Autonomous linking in isolation tends to hallucinate connections that look plausible but aren't load-bearing.

On persistent graphs across sessions — the jump from flat index files to a queryable graph was the single highest-ROI infrastructure investment. Before: every session was a cold start. After: the agent queries its own history, finds prior decisions, avoids re-running failed experiments. The graph isn't a nice-to-have — it's what makes long-running agent work possible at all.

Curious about the Vault Linker's verification phase — how do you detect false-positive relationships that look semantically valid but aren't actually meaningful?

Collapse
 
penfieldlabs profile image
Penfield

Thanks for the detailed breakdown. The tiered search approach makes sense. That mirrors what we've seen too, that relationship discovery works better as a byproduct of real work than as a dedicated pass.

On your question about detecting false-positive relationships: honestly, you can't catch them all at verification time. Automated linking pass will always produce some plausible-looking mistakes.

Our approach with Penfield is to treat the graph as a living thing. Once the vault is imported, agents have tools to remove bad connections, add new ones, update memories, and flag contradictions. The graph gets more accurate the more you use it, not less. Verification is an ongoing relationship between the agent and its own memory.

Trying to get the graph perfect on import is a fool's errand. Getting it 90% of the way there and then giving agents the tools to actively maintain it is the way forward.

Collapse
 
deadbyapril profile image
Survivor Forge

That 90% import + living graph model makes a lot of sense. Trying to get every relationship right at parse time is fighting the wrong battle — the context that clarifies ambiguous connections only shows up during actual use. The agent catching a contradiction mid-task is a much higher-signal correction than any static verification pass. Does Penfield surface those correction moments back to the user, or does the graph just silently update?

Thread Thread
 
penfieldlabs profile image
Penfield

It depends. What LLM you're using, how you prompt it, what platform you're on. In most setups you can see the MCP tool calls directly. You'll see the agent call disconnect on a relationship it's removing and connect when it creates a new one. If the model supports visible reasoning, it'll usually explain what it's doing there. It may or may not surface that in chat unprompted.

You can also control this. In your system prompt or Penfield custom instructions you can explicitly tell the agent to explain any changes it makes, or even ask permission before modifying the graph. You can also configure the MCP tools themselves: require approval on connect and disconnect rather than always-allow, so nothing changes without you signing off. It's your graph, your rules.

We're building a new bulk import layer now. It will get you 90% of the way there, and then the living graph can clean itself up through actual use.

Thread Thread
 
deadbyapril profile image
Survivor Forge

The living graph cleans itself through use framing matches what we see — nodes that get traversed stay accurate, the ones nobody queries drift. On the bulk import layer: curious how you handle deduplication when pulling from multiple sources. We ingest from several different data streams and the hardest part is merging entities that appear under different identifiers across sources. Is that something the import layer addresses, or is that left to the agent to resolve during use?

Thread Thread
 
penfieldlabs profile image
Penfield

Honest answer: there's no one-size-fits-all.

For concepts and ideas, near-duplicates across sources aren't really a problem. The agent connects "ML" and "machine learning" naturally when you ask about either one. For people and orgs across different identifier schemes , that's a genuinely difficult problem, and we'd rather have two nodes the agent can disambiguate than incorrect merges that corrupt the data.

Curious what entity types are giving you the most trouble across your data streams?

Collapse
 
automate-archit profile image
Archit Mittal

The typed relationships gap is the one that resonated most with me. I maintain a knowledge base for my automation consulting clients using Obsidian, and the flat wikilink graph becomes nearly useless past ~200 notes — everything connects to everything with no hierarchy of importance.

The @supersedes type is especially valuable for technical documentation that evolves. When a client's API changes or a workflow gets deprecated, knowing that note B supersedes note A is critical context that plain wikilinks lose completely. Right now I handle this manually with a "status: deprecated" frontmatter field, but that doesn't capture what replaced it.

The persistence gap you describe is also the biggest pain point with MCP-based workflows. Every new Claude Code session starts cold, and re-parsing even a well-structured CLAUDE.md takes tokens that could go toward actual work. A persistent graph backend that survives sessions would be a game-changer for long-running projects.

Collapse
 
sunychoudhary profile image
Suny Choudhary

Interesting take.

LLM Wiki patterns explain the structure well, but they don’t really address how systems behave over time. Once you start updating, merging, and reusing entries, consistency and drift become real issues.

Feels like the missing piece isn’t just better structure, but better control over how knowledge evolves.

Collapse
 
mcharris profile image
MC Harris

This looks truly excellent. I've wanted a way to add relations to links since Obsidian appeared. I will check it out. Thank you.

Collapse
 
mcharris profile image
MC Harris

I would also consider self-hosted with zero reach-back to other cloud services.

Collapse
 
mcharris profile image
MC Harris

I see it requires use of a cloud service. I'm looking for a local, offline-capable solution with no 'call home'. Any chance you will offer that in the future?

Collapse
 
printo_tom profile image
Printo Tom

Looks truly excellent

Collapse
 
pavelbuild profile image
Pavel Gajvoronski

Really useful breakdown of the gaps. The persistence problem resonates — every new session the agent re-discovers what it already knew. The observability side of this is equally broken: even if you have persistent memory, when something goes wrong mid-chain you still can't tell which tool call caused it or what the agent's state was at that point. That's the gap TraceHawk addresses — MCP-native tracing so you see exactly what happened, not just what was stored. Curious how Penfield handles session replay when debugging a failed chain.

Collapse
 
nadhir_mb profile image
Nadhir MAZARI BOUFARES

My take is that adding content to relationship types alone won't solve it when connections and types explode in volume.

What I am doing is contextualizing notes through frontmatter properties and using graph filters and groups to isolate meaningful clusters.

I think plugins like Breadcrumbs (typed directional links) and Dataview (query notes as a database) take it further.

Collapse
 
penfieldlabs profile image
Penfield

You're right, typed relationships alone don't solve it at scale. That's the point, it's only one step in the process. The linking your vault gets you from flat files to typed connections, but the real work starts when you import that into Penfield. Once it's there, agents have tools to query, prune bad connections, add new ones, and actively maintain the graph as you work. No amount of frontmatter and Dataview queries replaces allowing your agents to reason over and modify the knowledge base.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

the underlying problem here is staleness, not structure. wikis rot faster than code. setups that survive long-term have auto-expiry on entries rather than maintenance workflows.

Collapse
 
penfieldlabs profile image
Penfield

Staleness is a symptom, not the cause. Knowledge doesn't expire on a timer, a conversation from six months ago might be irrelevant today or might be the key context you need tomorrow. Auto-expiry would throw both out equally. The fix is agents that can evaluate whether something is still relevant in context, not a cron job deleting anything older than X days or months.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

fair - auto-expiry is blunt. time alone can't judge relevance. evaluation agents make more sense but harder to trust initially.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.