Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test
Tairan Fu, Francisco Javier Santos-Mart\'in, Javier Conde, Pedro Reviriego, Elena Merino-G\'omez

TL;DR
This paper assesses the limitations of current Vision-Language Models in interpreting analog gauges in industrial settings, highlighting their inadequacy for safety-critical measurement tasks.
Contribution
It introduces a new dataset of gauge videos and evaluates leading VLMs, revealing their inability to meet metrological standards for reliability and traceability.
Findings
Current VLMs struggle with needle trajectory interpretation.
Models fail to reliably analyze scale semantics.
VLMs do not meet IEEE and ISO standards for safety-critical use.
Abstract
The digital transformation of industrial manufacturing increasingly relies on the ability of autonomous robots to interact with legacy infrastructure, particularly analog gauges. While Vision-Language Models (VLMs) have demonstrated potential in zero-shot instrument recognition, their deployment in measurement systems remains constrained by an inherent inability to accurately analyze high-frequency temporal events and needle vibrations. This paper evaluates state-of-the-art models, including GPT-5 and Gemini 3, against the strict requirements of metrology and uncertainty quantification. To facilitate this evaluation, we introduce a novel dataset comprising video sequences of various gauge types: circular, linear, and Vernier, under diverse motion speed profiles. Our findings indicate that current VLMs exhibit limited ability in interpreting needle trajectories and scale semantics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
