MMFormalizer: Multimodal Autoformalization in the Wild
Jing Xiong, Qi Han, Yunta Hsieh, Hui Shen, Huajian Xin, Chaofan Tao, Chenyang Zhao, Hengyuan Zhang, Taiqiang Wu, Zhen Zhang, Haochen Wang, Zhongwei Wan, Lingpeng Kong, Ngai Wong

TL;DR
MMFormalizer introduces a novel multimodal autoformalization framework that integrates visual perception with formal reasoning, enabling the translation of real-world multimodal data into formal mathematical statements across diverse physics domains.
Contribution
It extends autoformalization to multimodal inputs, recursively constructs formal propositions from perceptual primitives, and evaluates on a new benchmark, demonstrating capabilities in complex physical reasoning.
Findings
GPT-5 achieves high accuracy in physical reasoning tasks.
Geometry remains the most challenging domain.
First method to handle classical mechanics, relativity, quantum mechanics, and thermodynamics multimodally.
Abstract
Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling
