Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Yiyu Wang; Xuyang Liu; Xiyan Gui; Xinying Lin; Boxue Yang; Chenfei Liao; Tailai Chen; Linfeng Zhang

arXiv:2512.00891·cs.CV·February 12, 2026

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Yiyu Wang, Xuyang Liu, Xiyan Gui, Xinying Lin, Boxue Yang, Chenfei Liao, Tailai Chen, Linfeng Zhang

PDF

Open Access

TL;DR

This paper introduces STC, a hierarchical token compression framework that significantly accelerates streaming VideoLLMs by caching and pruning tokens, reducing latency while maintaining high accuracy.

Contribution

The paper proposes a novel plug-and-play hierarchical token compression framework, STC, that improves efficiency of streaming VideoLLMs by caching and pruning visual tokens.

Findings

01

STC reduces ViT encoding latency by 24.5%.

02

STC decreases LLM pre-filling latency by 45.3%.

03

STC retains up to 99% accuracy on ReKV.

Abstract

Streaming Video Large Language Models (VideoLLMs) have demonstrated impressive performance across various video understanding tasks, but they face significant challenges in real-time deployment due to the high computational cost of processing dense visual tokens from continuous video streams. In streaming video scenarios, the primary bottleneck lies in the Vision Transformer (ViT) encoding stage, where redundant processing of temporally similar frames leads to inefficiency. Additionally, inflated token sequences during LLM pre-filling further exacerbate latency and memory overhead. To address these challenges, we propose \textbf{S}treaming \textbf{T}oken \textbf{C}ompression (\textbf{STC}), a plug-and-play hierarchical framework that seamlessly integrates into existing streaming VideoLLMs, optimizing both ViT encoding and LLM pre-filling stages to accelerate processing. STC introduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization