Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
Tairan Fu, Miguel Gonz\'alez, Javier Conde, Elena Merino-G\'omez, Pedro Reviriego

TL;DR
This paper investigates whether multimodal large language models like GPT-4.1 truly understand how to tell time on analog clocks or if they merely recognize patterns from training data, highlighting their limitations in generalization.
Contribution
The study evaluates GPT-4.1's ability to tell time on analog clocks, revealing its progress and limitations in understanding versus pattern recognition.
Findings
Models show some ability to read clock times.
Models struggle with generalizing to new clock images.
Fine-tuning may improve but does not fully solve the problem.
Abstract
Multimodal Large Language Models which can answer complex questions on an image struggle to tell the time on analog clocks. This is probably due to the lack of images with clocks at different times in their training set. In this work we explore this issue with one of the latest MLLMs: GPT-4.1 to understand why MLLMs fail to tell the time and whether fine-tuning can solve the problem. The results show how models are making progress in reading the time on analog clocks. But have they really learned to do it, or have they only learned patterns in their training datasets? In this work we put the models to the test with different clocks to illustrate the limitations of MLLMs to abstract and generalize.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Language and cultural evolution
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Dropout · Adam · Multi-Head Attention · Dense Connections · Layer Normalization · Softmax
