Consistent Autoformalization for Constructing Mathematical Libraries

Lan Zhang; Xin Quan; Andre Freitas

arXiv:2410.04194·cs.CL·June 16, 2025

Consistent Autoformalization for Constructing Mathematical Libraries

Lan Zhang, Xin Quan, Andre Freitas

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a coordinated approach combining retrieval, denoising, and auto-correction mechanisms to enhance the consistency and reliability of autoformalization in large mathematical libraries using LLMs.

Contribution

It proposes a novel integrated framework with MS-RAG, denoising, and Auto-SEF to improve autoformalization quality in complex mathematical domains.

Findings

01

Mechanisms improve syntactic consistency

02

Results are more terminologically accurate

03

Semantic coherence is enhanced

Abstract

Autoformalization is the task of automatically translating mathematical content written in natural language to a formal language expression. The growing language interpretation capabilities of Large Language Models (LLMs), including in formal languages, are lowering the barriers for autoformalization. However, LLMs alone are not capable of consistently and reliably delivering autoformalization, in particular as the complexity and specialization of the target domain grows. As the field evolves into the direction of systematically applying autoformalization towards large mathematical libraries, the need to improve syntactic, terminological and semantic control increases. This paper proposes the coordinated use of three mechanisms, most-similar retrieval augmented generation (MS-RAG), denoising steps, and auto-correction with syntax error feedback (Auto-SEF) to improve autoformalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lanzhang128/retrieval_augmented_autoformalization
noneOfficial

Videos

Consistent Autoformalization for Constructing Mathematical Libraries· underline

Taxonomy

TopicsMathematics, Computing, and Information Processing