The Network Structure of Mathlib
Xinze Li, Nanyun Peng, Simone Severini, Patrick Shafto

TL;DR
This paper analyzes the complex network structure of Lean 4's Mathlib, revealing insights into its dependency graph, taxonomy divergence, and semantic hierarchy compression through network analysis.
Contribution
It introduces a network analysis approach to formalized mathematics, providing new insights into the structure and dependencies within Mathlib.
Findings
Taxonomies diverge from logical structures with 50.9% namespace coupling.
Developers use a median of 1.6% of imported scope.
Formalization compresses semantic hierarchies, with network centrality reflecting infrastructure.
Abstract
The ongoing development of Lean 4's Mathlib has produced a macroscopic structural complexity that interweaves logical, mathematical, and infrastructural dependencies. We present a network analysis of this library, extracting its dependency structure into a multilayer graph of 308,129 declarations, 8.4 million edges, and 7,563 modules. By introducing graph decompositions that isolate explicit edges from those synthesized by the compiler or driven by proofs, we quantify the structural properties of formalized mathematics. Our analysis reveals three findings. First, taxonomies designed by humans diverge from logical structures, exhibiting a 50.9% coupling across namespaces. Second, developers utilize a median of 1.6% of the imported scope. Third, formalization compresses semantic hierarchies, with network centrality capturing language infrastructure rather than mathematical relevance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
