TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models

Junzhe Chen; Siyuan Meng; Yuxi Chen; Man Zhao; Wenyao Gui; Xiaojie Guo

arXiv:2605.09904·cs.CV·May 13, 2026

TOC-Bench: A Temporal Object Consistency Benchmark for Video Large Language Models

Junzhe Chen, Siyuan Meng, Yuxi Chen, Man Zhao, Wenyao Gui, Xiaojie Guo

PDF

1 Repo

TL;DR

TOC-Bench is a new benchmark designed to evaluate and diagnose the ability of Video-LLMs to maintain temporal object consistency across complex scenarios, revealing key weaknesses in current models.

Contribution

The paper introduces TOC-Bench, a structured, human-verified benchmark for assessing temporal object consistency in Video-LLMs, with a novel filtering protocol to ensure temporal dependency.

Findings

01

Current Video-LLMs struggle with object identity and event ordering.

02

Temporal object consistency is a major unresolved challenge for Video-LLMs.

03

TOC-Bench reveals weaknesses in event counting, ordering, and hallucination-aware reasoning.

Abstract

Video large language models (Video-LLMs) have made strong progress in general video understanding, but their ability to maintain temporal object consistency remains underexplored. Existing benchmarks often emphasize event recognition, action understanding, or coarse temporal reasoning, while rarely testing whether models can preserve the identity, state, and continuity of the same object across occlusion, disappearance, reappearance, state transitions, and cross-object interactions. We introduce TOC-Bench, a diagnostic benchmark for evaluating temporal object consistency in Video-LLMs. TOC-Bench is object-track grounded: each queried subject is linked to a per-frame trajectory and a structured temporal event timeline. To ensure that questions require temporally ordered visual evidence rather than language priors, single-frame shortcuts, or unordered frame cues, we design a three-layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cjzcjz666/toc_bench.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.