LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding

Xiaodong Wang; Langling Huang; Zhirong Wu; Xu Zhao; Teng Xu; Xuhong Xia; Peixi Peng

arXiv:2601.15016·cs.CV·January 22, 2026

LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding

Xiaodong Wang, Langling Huang, Zhirong Wu, Xu Zhao, Teng Xu, Xuhong Xia, Peixi Peng

PDF

Open Access 1 Datasets

TL;DR

LiViBench introduces the first comprehensive benchmark for interactive livestream videos, incorporating diverse tasks and modalities, and develops LiVi-LLM-7B, a model tailored for understanding interactive livestream content.

Contribution

This work presents LiViBench, a novel omnimodal benchmark for interactive livestream videos, and introduces LiVi-LLM-7B, a specialized large language model with enhanced capabilities for this domain.

Findings

01

LiVi-LLM-7B outperforms larger open-source models with up to 72B parameters.

02

The benchmark covers 24 diverse interactive livestream tasks.

03

The proposed model narrows the gap with leading proprietary models on LiViBench.

Abstract

The development of multimodal large language models (MLLMs) has advanced general video understanding. However, existing video evaluation benchmarks primarily focus on non-interactive videos, such as movies and recordings. To fill this gap, this paper proposes the first omnimodal benchmark for interactive livestream videos, LiViBench. It features a diverse set of 24 tasks, highlighting the perceptual, reasoning, and livestream-specific challenges. To efficiently construct the dataset, we design a standardized semi-automatic annotation workflow that incorporates the human-in-the-loop at multiple stages. The workflow leverages multiple MLLMs to form a multi-agent system for comprehensive video description and uses a seed-question-driven method to construct high-quality annotations. All interactive videos in the benchmark include audio, speech, and real-time comments modalities. To enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Xiaodong/LiViBench
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling