FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Jordan Meadows; Lan Zhang; Andre Freitas

arXiv:2604.23002·cs.AI·April 28, 2026

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Jordan Meadows, Lan Zhang, Andre Freitas

PDF

1 Repo

TL;DR

FormalScience introduces a human-in-the-loop pipeline for autoformalising scientific reasoning into Lean, enabling domain experts to generate formal proofs efficiently, demonstrated on a physics dataset with high validity.

Contribution

It presents a domain-agnostic, agentic approach for autoformalisation in scientific fields, including a physics dataset and analysis of semantic drift in LLM-based formalisation.

Findings

01

Achieved perfect formal validity on the physics dataset.

02

Evaluated models with zero-shot prompting, self-refinement, and multi-stage approaches.

03

Released code and interactive system for broader scientific autoformalisation.

Abstract

Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit{e.g.} Dirac notation, vector calculus) imposes additional formalisation challenges that modern LLMs and agentic approaches have yet to tackle. To aid autoformalisation in scientific domains, we present FormalScience; a domain-agnostic human-in-the-loop agentic pipeline that enables a single domain expert (without deep formal language experience) to produce \textit{syntactically correct} and \textit{semantically aligned} formal proofs of informal reasoning for low economic cost. Applying FormalScience to physics, we construct FormalPhysics, a dataset of 200 university-level (LaTeX) physics problems and solutions (primarily quantum mechanics and electromagnetism), along with their Lean4…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jmeadows17/formal-science
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.