Vision Language Models Know Law of Conservation without Understanding More-or-Less

Dezhi Luo; Haiyun Lyu; Qingying Gao; Haoran Sun; Yijiang Li; Hokin Deng

arXiv:2410.00332·cs.AI·August 14, 2025

Vision Language Models Know Law of Conservation without Understanding More-or-Less

Dezhi Luo, Haiyun Lyu, Qingying Gao, Haoran Sun, Yijiang Li, Hokin Deng

PDF

Open Access

TL;DR

This study evaluates Vision Language Models' understanding of the law of conservation through a comprehensive set of experiments, revealing they grasp reversibility but struggle with pure quantity concepts, indicating a partial cognitive understanding.

Contribution

Introduces ConserveBench, a new benchmark with 365 experiments to assess physical quantity understanding and reversibility in Vision Language Models.

Findings

01

Models excel at transformational, reversible tasks.

02

Models perform poorly on non-transformational quantity tasks.

03

Reversibility understanding does not imply quantity comprehension.

Abstract

Understanding law of conservation is a critical milestone in human cognitive development considered to be supported by the apprehension of quantitative concepts and the reversibility of operations. To assess whether this critical component of human intelligence has emerged in Vision Language Models, we have curated the ConserveBench, a battery of 365 cognitive experiments across four dimensions of physical quantities: volume, solid quantity, length, and number. The former two involve transformational tasks which require reversibility understanding. The latter two involve non-transformational tasks which assess quantity understanding. Surprisingly, we find that while Vision Language Models are generally good at transformational tasks, they tend to fail at non-transformational tasks. There is a dissociation between understanding the reversibility of operations and understanding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies