CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models
Yiming Zhang, Zhuokai Zhao, Chengzhang Yu, Kun Wang, Zhendong Chu, Qiankun Li, Zihan Chen, Yang Liu, Zenghui Ding, Yining Sun, Qingsong Wen

TL;DR
CircuitProbe is a novel analysis framework that dissects how video-language models represent and utilize temporal information, leading to targeted interventions that improve temporal understanding in these models.
Contribution
We introduce CircuitProbe, a circuit-level analysis method that localizes and traces temporal evidence in video-language models, enabling effective interventions for temporal reasoning.
Findings
Identifies temporally specialized attention heads in LVLMs.
Targeted interventions improve temporal understanding by up to 2.4%.
Validates the analysis framework's effectiveness on TempCompass benchmark.
Abstract
Autoregressive large vision--language models (LVLMs) interface video and language by projecting video features into the LLM's embedding space as continuous visual token embeddings. However, it remains unclear where temporal evidence is represented and how it causally influences decoding. To address this gap, we present CircuitProbe, a circuit-level analysis framework that dissects the end-to-end video-language pathway through two stages: (i) Visual Auditing, which localizes object semantics within the projected video-token sequence and reveals their causal necessity via targeted ablations and controlled substitutions; and (ii) Semantic Tracing, which uses logit-lens probing to track the layer-wise emergence of object and temporal concepts, augmented with temporal frame interventions to assess sensitivity to temporal structure. Based on the resulting analysis, we design a targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Neurobiology of Language and Bilingualism · Action Observation and Synchronization
