Coding the Visual World: From Image to Simulation Using Vision Language Models
Sagi Eppel

TL;DR
This paper investigates how Vision Language Models can interpret and simulate complex systems depicted in images by generating and executing code to produce synthetic images, revealing their strengths and limitations in understanding visual systems.
Contribution
It introduces the Im2Sim methodology, demonstrating VLMs' ability to model complex systems and highlighting their high-level understanding contrasted with limited detail perception.
Findings
VLMs can understand and simulate multi-component systems across various domains.
VLMs exhibit limited ability to replicate fine image details.
Leading VLMs show strong high-level system understanding but limited low-level pattern recognition.
Abstract
The ability to construct mental models of the world is a central aspect of understanding. Similarly, visual understanding can be viewed as the ability to construct a representative model of the system depicted in an image. This work explores the capacity of Vision Language Models (VLMs) to recognize and simulate the systems and mechanisms depicted in images using the Im2Sim methodology. The VLM is given a natural image of a real-world system (e.g., cities, clouds, vegetation) and is tasked with describing the system and writing code that simulates and generates it. This generative code is then executed to produce a synthetic image, which is compared against the original. This approach is tested on various complex emergent systems, ranging from physical systems (waves, lights, clouds) to vegetation, cities, materials, and geological formations. Through analysis of the models and images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Language, Metaphor, and Cognition · Categorization, perception, and language
