Improving Autoformalization Using Direct Dependency Retrieval

Shaoqi Wang; Lu Yu; Siwei Lou; Feng Yan; Chunjie Yang; Qing Cui; Jun Zhou

arXiv:2511.11990·cs.AI·January 5, 2026

Improving Autoformalization Using Direct Dependency Retrieval

Shaoqi Wang, Lu Yu, Siwei Lou, Feng Yan, Chunjie Yang, Qing Cui, Jun Zhou

PDF

Open Access

TL;DR

This paper introduces DDR, a novel retrieval-augmented framework for statement autoformalization that enhances dependency retrieval accuracy and scalability, significantly improving formal library dependency identification from natural language descriptions.

Contribution

The paper presents DDR, a new method for direct dependency retrieval that outperforms existing approaches in precision, recall, and scalability for autoformalization tasks.

Findings

01

DDR achieves higher retrieval precision and recall than SOTA methods.

02

The dependency retrieval dataset contains over 500,000 samples.

03

Autoformalizer with DDR shows better accuracy and stability in experiments.

Abstract

The convergence of deep learning and formal mathematics has spurred research in formal verification. Statement autoformalization, a crucial first step in this process, aims to translate informal descriptions into machine-verifiable representations but remains a significant challenge. The core difficulty lies in the fact that existing methods often suffer from a lack of contextual awareness, leading to hallucination of formal definitions and theorems. Furthermore, current retrieval-augmented approaches exhibit poor precision and recall for formal library dependency retrieval, and lack the scalability to effectively leverage ever-growing public datasets. To bridge this gap, we propose a novel retrieval-augmented framework based on DDR (\textit{Direct Dependency Retrieval}) for statement autoformalization. Our DDR method directly generates candidate library dependencies from natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Topic Modeling · Natural Language Processing Techniques