Consistent Autoformalization for Constructing Mathematical Libraries
Lan Zhang, Xin Quan, Andre Freitas

TL;DR
This paper introduces a coordinated approach combining retrieval, denoising, and auto-correction mechanisms to enhance the consistency and reliability of autoformalization in large mathematical libraries using LLMs.
Contribution
It proposes a novel integrated framework with MS-RAG, denoising, and Auto-SEF to improve autoformalization quality in complex mathematical domains.
Findings
Mechanisms improve syntactic consistency
Results are more terminologically accurate
Semantic coherence is enhanced
Abstract
Autoformalization is the task of automatically translating mathematical content written in natural language to a formal language expression. The growing language interpretation capabilities of Large Language Models (LLMs), including in formal languages, are lowering the barriers for autoformalization. However, LLMs alone are not capable of consistently and reliably delivering autoformalization, in particular as the complexity and specialization of the target domain grows. As the field evolves into the direction of systematically applying autoformalization towards large mathematical libraries, the need to improve syntactic, terminological and semantic control increases. This paper proposes the coordinated use of three mechanisms, most-similar retrieval augmented generation (MS-RAG), denoising steps, and auto-correction with syntax error feedback (Auto-SEF) to improve autoformalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMathematics, Computing, and Information Processing
