Semantic Search over 9 Million Mathematical Theorems
Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Jarod Alper, Giovanni Inchiostro, Vasily Ilin

TL;DR
This paper introduces a large-scale semantic search system for mathematical theorems, demonstrating its effectiveness in retrieving specific theorems from a vast corpus of research-level mathematical statements.
Contribution
It presents the largest publicly available corpus of research-level theorems and systematically analyzes factors affecting semantic theorem retrieval quality.
Findings
Semantic search significantly improves theorem retrieval accuracy.
Representation choices impact retrieval performance.
The system is effective at web-scale theorem search.
Abstract
Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · History and Theory of Mathematics · Logic, programming, and type systems
