TheoremGraph

Bridging formal and informal mathematics

Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin

The problem

Same theorem. Different worlds.

Humans see the equivalence. The machine sees unrelated strings.

The landscape

Informal is broad. Formal is exact.

11.75M paper statements · 388,105 Lean declarations · almost no bridge between them.

The object

A graph of facts and dependencies.

Nodes are statements. Arrows mean one statement leans on another.

Informal graph

Recover links papers only imply.

Author references, prose cues, and notation links each contribute a different precision/coverage tradeoff.

Validation

Keep the signal, keep the provenance.

Every edge preserves how it was found, so users can trade precision for coverage.

Formal graph

In Lean, the graph is already written.

LeanGraph reads checked declarations and the names they depend on.

The wall

Brute force is impossible.

385,657 searched formal declarations against 11.75M informal statements is about 4.56T possible pairs.

Step 1

Use slogans as a shared medium.

Lean code and LaTeX prose become short natural-language approximations.

Step 2

Meaning becomes a place.

Qwen3-Embedding-8B maps each slogan into a shared semantic space.

Step 3

Search nearby, not every pair.

HNSW gives a layered map of the embedding space: sparse at the top, denser below.

Candidate pairs

Retrieval gives leads, not answers.

Each row contains the formal statement, the informal statement, their slogans, and the cosine score.

Cutoff

Choose the floor by yield.

Below 0.8, the judge mostly confirms non-matches: 23% in 0.7-0.8 and 2% in 0.6-0.7.

Step 4

The judge reads the full record.

Both slogans, the Lean signature, the arXiv title, and the original LaTeX statement. Not the cosine score.

Matches

47,952 affirmed matches.

100,799 valid labels above the bar; exact and inexact together count as matches.

TheoremGraph

Two dependency graphs, joined by semantic bridges.

The result is not just a table of pairs. It is a navigable map across the paper world and the checked world.

What remains open

Better representations are the next problem.

  • Make sloganization less lossy.
  • Test whether embedding distances reflect real mathematical structure.
  • Compare embedders and judges seriously.
  • Use blind human evaluation as the final gate.
Thanks

Questions?

TheoremGraph is a starting point: graph infrastructure, evaluation, and representation research for formal/informal mathematics.

QR code for arXiv:2606.25363
Paper
TheoremGraph: Bridging Formal and Informal Mathematics
Simon Kurgan · Evan Wang · Eric Leonen · Sophie Szeto · Luke Alexander · Artemii Remizov · Jarod Alper · Giovanni Inchiostro · Vasily Ilin
1/180:00N notesF fullscreen