Improving Autoformalization Using Direct Dependency Retrieval
Shaoqi Wang, Lu Yu, Siwei Lou, Feng Yan, Chunjie Yang, Qing Cui, Jun Zhou

TL;DR
This paper introduces DDR, a novel retrieval-augmented framework for statement autoformalization that enhances dependency retrieval accuracy and scalability, significantly improving formal library dependency identification from natural language descriptions.
Contribution
The paper presents DDR, a new method for direct dependency retrieval that outperforms existing approaches in precision, recall, and scalability for autoformalization tasks.
Findings
DDR achieves higher retrieval precision and recall than SOTA methods.
The dependency retrieval dataset contains over 500,000 samples.
Autoformalizer with DDR shows better accuracy and stability in experiments.
Abstract
The convergence of deep learning and formal mathematics has spurred research in formal verification. Statement autoformalization, a crucial first step in this process, aims to translate informal descriptions into machine-verifiable representations but remains a significant challenge. The core difficulty lies in the fact that existing methods often suffer from a lack of contextual awareness, leading to hallucination of formal definitions and theorems. Furthermore, current retrieval-augmented approaches exhibit poor precision and recall for formal library dependency retrieval, and lack the scalability to effectively leverage ever-growing public datasets. To bridge this gap, we propose a novel retrieval-augmented framework based on DDR (\textit{Direct Dependency Retrieval}) for statement autoformalization. Our DDR method directly generates candidate library dependencies from natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Topic Modeling · Natural Language Processing Techniques
