TL;DR
FormalScience introduces a human-in-the-loop pipeline for autoformalising scientific reasoning into Lean, enabling domain experts to generate formal proofs efficiently, demonstrated on a physics dataset with high validity.
Contribution
It presents a domain-agnostic, agentic approach for autoformalisation in scientific fields, including a physics dataset and analysis of semantic drift in LLM-based formalisation.
Findings
Achieved perfect formal validity on the physics dataset.
Evaluated models with zero-shot prompting, self-refinement, and multi-stage approaches.
Released code and interactive system for broader scientific autoformalisation.
Abstract
Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit{e.g.} Dirac notation, vector calculus) imposes additional formalisation challenges that modern LLMs and agentic approaches have yet to tackle. To aid autoformalisation in scientific domains, we present FormalScience; a domain-agnostic human-in-the-loop agentic pipeline that enables a single domain expert (without deep formal language experience) to produce \textit{syntactically correct} and \textit{semantically aligned} formal proofs of informal reasoning for low economic cost. Applying FormalScience to physics, we construct FormalPhysics, a dataset of 200 university-level (LaTeX) physics problems and solutions (primarily quantum mechanics and electromagnetism), along with their Lean4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
