StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Haibo Wang; Bo Feng; Zhengfeng Lai; Mingze Xu; Shiyu Li; Weifeng Ge; Afshin Dehghan; Meng Cao; Ping Huang

arXiv:2505.05467·cs.CV·September 22, 2025

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang

PDF

Open Access

TL;DR

StreamBridge is a framework that enhances offline Video-LLMs with streaming capabilities, enabling real-time multi-turn understanding and proactive responses, supported by a new dataset and extensive experiments.

Contribution

It introduces a memory buffer with decay compression and a lightweight activation model, transforming offline Video-LLMs into effective streaming assistants.

Findings

01

Significantly improves streaming understanding in Video-LLMs

02

Outperforms proprietary models like GPT-4o and Gemini 1.5 Pro

03

Achieves competitive results on standard benchmarks

Abstract

We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer combined with a round-decayed compression strategy, supporting long-context multi-turn interactions, and (2) a decoupled, lightweight activation model that can be effortlessly integrated into existing Video-LLMs, enabling continuous proactive responses. To further support StreamBridge, we construct Stream-IT, a large-scale dataset tailored for streaming video understanding, featuring interleaved video-text sequences and diverse instruction formats. Extensive experiments show that StreamBridge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis