Rethinking Chain-of-Thought Reasoning for Videos
Yiwu Zhong, Zi-Yuan Hu, Yin Li, Liwei Wang

TL;DR
This paper proposes a concise reasoning approach with fewer visual tokens for video reasoning, leading to improved efficiency and competitive performance without relying on manual annotations or extensive fine-tuning.
Contribution
It introduces an efficient framework that enables video models to perform effective reasoning with compressed visual tokens and brief reasoning traces, challenging the need for lengthy CoT reasoning.
Findings
Enhanced inference efficiency with compressed tokens
Competitive performance across multiple benchmarks
No reliance on manual CoT annotations or supervised fine-tuning
Abstract
Chain-of-thought (CoT) reasoning has been highly successful in solving complex tasks in natural language processing, and recent multimodal large language models (MLLMs) have extended this paradigm to video reasoning. However, these models typically build on lengthy reasoning chains and large numbers of input visual tokens. Motivated by empirical observations from our benchmark study, we hypothesize that concise reasoning combined with a reduced set of visual tokens can be sufficient for effective video reasoning. To evaluate this hypothesis, we design and validate an efficient post-training and inference framework that enhances a video MLLM's reasoning capability. Our framework enables models to operate on compressed visual tokens and generate brief reasoning traces prior to answering. The resulting models achieve substantially improved inference efficiency, deliver competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
