Matlas: A Semantic Search Engine for Mathematics
Haocheng Ju, Leheng Chen, Peihao Wu, Bryan Dai, Bin Dong

TL;DR
Matlas is a large-scale semantic search engine for mathematical statements, enabling efficient retrieval of mathematical results from a vast corpus using natural language queries, supporting mathematicians and AI systems.
Contribution
The paper introduces Matlas, a novel semantic search engine built on a large, dependency-structured corpus of mathematical statements from diverse sources.
Findings
Built a corpus of 8.07 million statements from 435K papers and textbooks.
Constructed dependency graphs and unfolded statements for self-contained representations.
Developed a semantic retrieval system for natural language search of mathematical results.
Abstract
Retrieving mathematical knowledge is a central task in both human-driven research, such as determining whether a result already exists, finding related results, and identifying historical origins, and in emerging AI systems for mathematics, where reliable grounding is essential. However, the scale and structure of the mathematical literature pose significant challenges: results are distributed across millions of documents, and individual statements are often difficult to interpret in isolation due to their dependence on prior definitions and theorems. In this paper, we introduce Matlas, a semantic search engine for mathematical statements. Matlas is built on a large-scale corpus of 8.07 million statements extracted from 435K peer-reviewed papers spanning 1826 to 2025, drawn from a curated set of 180 journals selected using an ICM citation-based criterion, together with 1.9K textbooks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
