MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Erle Zhu, Yadi Liu, Zhe Zhang, Xujun Li, Jin Zhou, Xinjie Yu, Minlie Huang, Hongning Wang

TL;DR
MAPS enhances multi-modal scientific reasoning in large language models by integrating physical diagram understanding and simulation-based reasoning, significantly improving accuracy in expert-level physics problems.
Contribution
Introduces MAPS, a novel framework combining physical perception and simulation to advance multi-modal reasoning in large language models for scientific tasks.
Findings
Significantly improves reasoning accuracy on college-level circuit problems.
Outperforms existing models in multi-modal scientific reasoning.
Demonstrates effectiveness of synthetic data fine-tuning for physical diagram understanding.
Abstract
Pre-trained on extensive text and image corpora, current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. However, their performance is still lacking in physical domains that require understanding diagrams with complex physical structures and quantitative analysis based on multi-modal information. To address this, we develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator. The PPM module is obtained by fine-tuning a visual language model using carefully designed synthetic data with paired physical diagrams and corresponding simulation language descriptions. At the inference…
Peer Reviews
Decision·ICLR 2025 Poster
The advantages of this paper are mainly reflected in the following aspects: - This paper proposes an innovative process framework that can combine physical perception models (PPMs) with simulator outcomes to infer answers to physical problems. The framework integrates the understanding of physical diagrams with the reasoning of physical knowledge, and its effectiveness has been validated through experiments; - The paper designs and introduces a synthetic dataset named ppm-syn-lprc, which is used
Writing: - There are some typos in the article, and it is recommended that the author carefully proofread and corrected them to enhance the professionalism and readability of the paper. Experimental Design: - Generalization Issues: The results of this study have only been validated on the GPT-4V model, which may not be sufficient to demonstrate the applicability of the framework to other model architectures. It is suggested that the authors extend the evaluation of the framework to different mo
The paper makes several contributions. It creates a large dataset of diverse circuit diagrams. It proposes to decompose expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator. (This is a good approach.) The paper show that MAPS improves reasoning accuracy in the electronic circuit analysis domain. The authors point out that all the information about a circuit may not be in the diagr
It is not clear where errors happen. An error analysis of a sample of the evaluation data would be helpful. In particular, please provide breakdown of errors by type (e.g., perception errors vs. reasoning errors), and give examples of common failure cases. What are the different kinds of questions addressed in the paper? Do they need simulation or just a solver. (With simulation capability many more complex questions can be answered using multiple simulations.) Please provide a categorization
S1: The integration of perception and simulation for multi-modal scientific reasoning is well-conceived and leverages MLLM strengths while mitigating their weaknesses in handling complex diagrams. S2: Results from circuit analysis problems highlight a notable increase in accuracy, showcasing MAPS' ability to outperform current state-of-the-art methods. S3: The paper provides comprehensive explanations for data synthesis, PPM training, and the inference process, enhancing reproducibility.
W1: The framework is tested primarily on circuit analysis, which may not fully capture its adaptability across different physical sciences. W2: The multi-step process involving diagram conversion, SL generation, and simulation may introduce cumulative errors, which could affect real-world applicability. W3: The reliance on synthetic data poses a challenge for real-world accuracy, as unseen or complex diagrams might not align with generated examples.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
