CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

Simon Kohaut; Daniel Ochs; Shun Zhang; Benedict Flade; Julian Eggert; Kristian Kersting; Devendra Singh Dhami

arXiv:2512.01095·cs.CV·December 2, 2025

CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

PDF

Open Access

TL;DR

CycliST introduces a benchmark dataset to evaluate Video Language Models on their ability to reason about cyclical state transitions, revealing current models' limitations in understanding periodic patterns and temporal dynamics.

Contribution

This paper presents CycliST, a new synthetic video benchmark for testing VLMs on cyclical reasoning, highlighting the gaps in current models' temporal understanding and generalization capabilities.

Findings

01

Current VLMs struggle with cyclical motion detection.

02

Models lack robust temporal reasoning about periodic patterns.

03

No single model outperforms others across all tasks.

Abstract

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation