Thinking in Streaming Video

Zikang Liu; Longteng Guo; Handong Li; Ru Zhen; Xingjian He; Ruyi Ji; Xiaoming Ren; Yanhao Zhang; Haonan Lu; Jing Liu

arXiv:2603.12938·cs.CV·March 16, 2026

Thinking in Streaming Video

Zikang Liu, Longteng Guo, Handong Li, Ru Zhen, Xingjian He, Ruyi Ji, Xiaoming Ren, Yanhao Zhang, Haonan Lu, Jing Liu

PDF

Open Access

TL;DR

ThinkStream introduces an incremental streaming video reasoning framework that updates understanding in real-time, reducing latency and memory use for dynamic environments.

Contribution

The paper presents a novel Watch--Think--Speak paradigm, Reasoning-Compressed Streaming Memory, and Streaming Reinforcement Learning to enable efficient real-time video reasoning.

Findings

01

Outperforms existing online video models in accuracy and speed

02

Maintains low latency and memory usage in streaming scenarios

03

Supports long-horizon streaming reasoning

Abstract

Real-time understanding of continuous video streams is essential for interactive assistants and multimodal agents operating in dynamic environments. However, most existing video reasoning approaches follow a batch paradigm that defers reasoning until the full video context is observed, resulting in high latency and growing computational cost that are incompatible with streaming scenarios. In this paper, we introduce ThinkStream, a framework for streaming video reasoning based on a Watch--Think--Speak paradigm that enables models to incrementally update their understanding as new video observations arrive. At each step, the model performs a short reasoning update and decides whether sufficient evidence has accumulated to produce a response. To support long-horizon streaming, we propose Reasoning-Compressed Streaming Memory (RCSM), which treats intermediate reasoning traces as compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Human Pose and Action Recognition