Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?

Tairan Fu; Miguel Gonz\'alez; Javier Conde; Elena Merino-G\'omez; Pedro Reviriego

arXiv:2505.10862·cs.CL·May 19, 2025

Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?

Tairan Fu, Miguel Gonz\'alez, Javier Conde, Elena Merino-G\'omez, Pedro Reviriego

PDF

Open Access 1 Datasets

TL;DR

This paper investigates whether multimodal large language models like GPT-4.1 truly understand how to tell time on analog clocks or if they merely recognize patterns from training data, highlighting their limitations in generalization.

Contribution

The study evaluates GPT-4.1's ability to tell time on analog clocks, revealing its progress and limitations in understanding versus pattern recognition.

Findings

01

Models show some ability to read clock times.

02

Models struggle with generalizing to new clock images.

03

Fine-tuning may improve but does not fully solve the problem.

Abstract

Multimodal Large Language Models which can answer complex questions on an image struggle to tell the time on analog clocks. This is probably due to the lack of images with clocks at different times in their training set. In this work we explore this issue with one of the latest MLLMs: GPT-4.1 to understand why MLLMs fail to tell the time and whether fine-tuning can solve the problem. The results show how models are making progress in reading the time on analog clocks. But have they really learned to do it, or have they only learned patterns in their training datasets? In this work we put the models to the test with different clocks to illustrate the limitations of MLLMs to abstract and generalize.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

migonsa/analog_clocks_combinations_for_finetuning
dataset· 96 dl
96 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Language and cultural evolution

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Dropout · Adam · Multi-Head Attention · Dense Connections · Layer Normalization · Softmax