Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun; Qingying Gao; Haiyun Lyu; Dezhi Luo; Yijiang Li; Hokin Deng

arXiv:2410.00318·cs.AI·August 14, 2025

Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun, Qingying Gao, Haiyun Lyu, Dezhi Luo, Yijiang Li, Hokin Deng

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates the mechanical reasoning abilities of 26 Vision Language Models across various physics domains, revealing significant gaps compared to human performance and highlighting limitations in current architectures.

Contribution

It introduces a comprehensive benchmark of mechanical reasoning tasks for VLMs and uncovers their persistent shortcomings, especially in complex physics reasoning.

Findings

01

VLMs perform worse than humans across all tested domains.

02

Performance does not improve with larger model sizes.

03

Current architectures struggle with mental simulation tasks in physics.

Abstract

Mechanical reasoning is a hallmark of human intelligence, defined by its ubiquitous yet irreplaceable role in human activities ranging from routine tasks to civil engineering. Embedding machines with mechanical reasoning is therefore an important step towards building human-level artificial intelligence. Here, we leveraged 155 cognitive experiments to test the understanding of system stability, gears and pulley systems, leverage principle, inertia and motion, and fluid mechanics in 26 Vision Language Models (VLMs). Results indicate that VLMs consistently perform worse than humans on all domains, while demonstrate significant difficulty in reasoning about gear systems and fluid mechanics. Notably, their performance on these tasks do not improve as number of parameters increase, suggesting that current attention-based architecture may fail to grasp certain underlying mechanisms required…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

grow-ai-like-a-child/mechanical-reasoning
dataset· 85 dl
85 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsSparse Evolutionary Training