A Matter of Time: Revealing the Structure of Time in Vision-Language Models

Nidham Tekaya; Manuela Waldner; Matthias Zeppelzauer

arXiv:2510.19559·cs.CV·October 23, 2025

A Matter of Time: Revealing the Structure of Time in Vision-Language Models

Nidham Tekaya, Manuela Waldner, Matthias Zeppelzauer

PDF

TL;DR

This paper explores how vision-language models understand and represent time, introducing a new benchmark dataset and methods to extract explicit timelines from model embeddings for improved temporal reasoning.

Contribution

It introduces TIME10k, a benchmark dataset for temporal evaluation of VLMs, and proposes methods to derive explicit timeline representations from their embeddings.

Findings

01

Temporal information in VLMs is structured on a low-dimensional, non-linear manifold.

02

Proposed timeline methods outperform prompt-based baselines in temporal reasoning tasks.

03

The approach is computationally efficient and effective for modeling time in VLMs.

Abstract

Large-scale vision-language models (VLMs) such as CLIP have gained popularity for their generalizable and expressive multimodal representations. By leveraging large-scale training data with diverse textual metadata, VLMs acquire open-vocabulary capabilities, solving tasks beyond their training scope. This paper investigates the temporal awareness of VLMs, assessing their ability to position visual content in time. We introduce TIME10k, a benchmark dataset of over 10,000 images with temporal ground truth, and evaluate the time-awareness of 37 VLMs by a novel methodology. Our investigation reveals that temporal information is structured along a low-dimensional, non-linear manifold in the VLM embedding space. Based on this insight, we propose methods to derive an explicit ``timeline'' representation from the embedding space. These representations model time and its chronological…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.