LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Zhenyu Yang; Kairui Zhang; Yuhang Hu; Bing Wang; Shengsheng Qian; Bin Wen; Fan Yang; Tingting Gao; Weiming Dong; Changsheng Xu

arXiv:2511.05299·cs.CV·November 10, 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Zhenyu Yang, Kairui Zhang, Yuhang Hu, Bing Wang, Shengsheng Qian, Bin Wen, Fan Yang, Tingting Gao, Weiming Dong, Changsheng Xu

PDF

Open Access

TL;DR

LiveStar is a novel live streaming assistant for real-time online video understanding that improves responsiveness and narrative coherence through adaptive streaming decoding, incremental alignment, and memory-efficient inference.

Contribution

The paper introduces LiveStar, a pioneering online Video-LLM with adaptive decoding, incremental training, and memory optimization, plus the new OmniStar dataset for benchmarking.

Findings

01

Achieves 19.5% better semantic accuracy over existing methods.

02

Reduces response timing difference by 18.1%.

03

Increases inference speed by 12%.

Abstract

Despite significant progress in Video Large Language Models (Video-LLMs) for offline video understanding, existing online Video-LLMs typically struggle to simultaneously process continuous frame-by-frame inputs and determine optimal response timing, often compromising real-time responsiveness and narrative coherence. To address these limitations, we introduce LiveStar, a pioneering live streaming assistant that achieves always-on proactive responses through adaptive streaming decoding. Specifically, LiveStar incorporates: (1) a training strategy enabling incremental video-language alignment for variable-length video streams, preserving temporal consistency across dynamically evolving frame sequences; (2) a response-silence decoding framework that determines optimal proactive response timing via a single forward pass verification; (3) memory-aware acceleration via peak-end memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning