Learning an Effective Premise Retrieval Model for Efficient Mathematical Formalization
Yicheng Tao, Haotian Liu, Shanwen Wang, Hongteng Xu

TL;DR
This paper presents a novel premise retrieval model for formalized mathematics that uses contrastive learning on formal corpora, outperforming existing methods in accuracy and efficiency, and is available as an open-source search engine.
Contribution
The paper introduces a contrastive learning-based premise retrieval model trained on Mathlib data, improving accuracy and efficiency for formalized mathematics retrieval tasks.
Findings
Outperforms existing baseline retrieval methods.
Achieves higher accuracy with lower computational load.
Provides an open-source search engine and code for community use.
Abstract
Formalized mathematics has recently garnered significant attention for its ability to assist mathematicians across various fields. Premise retrieval, as a common step in mathematical formalization, has been a challenge, particularly for inexperienced users. Existing retrieval methods that facilitate natural language queries require a certain level of mathematical expertise from users, while approaches based on formal languages (e.g., Lean) typically struggle with the scarcity of training data, hindering the training of effective and generalizable retrieval models. In this work, we introduce a novel method that leverages data extracted from Mathlib to train a lightweight and effective premise retrieval model. In particular, the proposed model embeds queries (i.e., proof state provided by Lean) and premises in a latent space, featuring a tokenizer specifically trained on formal corpora.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Intelligent Tutoring Systems and Adaptive Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Softmax · Linear Warmup With Linear Decay · Adam · Residual Connection · Dropout · WordPiece
