From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs
Jialun Cao, Yaojie Lu, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen, Le Sun, Hongyu Zhang, Shengchao Qin, Shing-Chi Cheung, Cong Tian

TL;DR
This paper develops a large dataset of formal verification tasks in multiple languages, evaluates various LLMs on these tasks, and fine-tunes smaller models to match larger ones, improving reasoning and coding skills.
Contribution
It introduces a new dataset of 18k formal verification pairs across five languages and demonstrates fine-tuning smaller models to achieve performance comparable to larger models.
Findings
Fine-tuned models perform on par with larger models.
Fine-tuning improves reasoning, mathematics, and coding skills.
Models are publicly released for further research.
Abstract
The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗fm-universe/llama3.1-8b-fmamodel· 13 dl13 dl
- 🤗fm-universe/llama3.1-8b-instruct-fmamodel· 11 dl11 dl
- 🤗fm-universe/llama3.1-8b-tulu-3-fmamodel· 7 dl· ♡ 17 dl♡ 1
- 🤗fm-universe/llama3.1-8b-ultrachat-fmamodel· 1 dl1 dl
- 🤗fm-universe/qwen2.5-7b-fmamodel· 4 dl4 dl
- 🤗fm-universe/qwen2.5-7b-instruct-fmamodel
- 🤗fm-universe/qwen2.5-coder-7b-instruct-fmamodel· 8 dl· ♡ 98 dl♡ 9
- 🤗fm-universe/deepseek-coder-7b-instruct-v1.5-fmamodel· 3 dl· ♡ 53 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
