LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding
Xiaodong Wang, Langling Huang, Zhirong Wu, Xu Zhao, Teng Xu, Xuhong Xia, Peixi Peng

TL;DR
LiViBench introduces the first comprehensive benchmark for interactive livestream videos, incorporating diverse tasks and modalities, and develops LiVi-LLM-7B, a model tailored for understanding interactive livestream content.
Contribution
This work presents LiViBench, a novel omnimodal benchmark for interactive livestream videos, and introduces LiVi-LLM-7B, a specialized large language model with enhanced capabilities for this domain.
Findings
LiVi-LLM-7B outperforms larger open-source models with up to 72B parameters.
The benchmark covers 24 diverse interactive livestream tasks.
The proposed model narrows the gap with leading proprietary models on LiViBench.
Abstract
The development of multimodal large language models (MLLMs) has advanced general video understanding. However, existing video evaluation benchmarks primarily focus on non-interactive videos, such as movies and recordings. To fill this gap, this paper proposes the first omnimodal benchmark for interactive livestream videos, LiViBench. It features a diverse set of 24 tasks, highlighting the perceptual, reasoning, and livestream-specific challenges. To efficiently construct the dataset, we design a standardized semi-automatic annotation workflow that incorporates the human-in-the-loop at multiple stages. The workflow leverages multiple MLLMs to form a multi-agent system for comprehensive video description and uses a seed-question-driven method to construct high-quality annotations. All interactive videos in the benchmark include audio, speech, and real-time comments modalities. To enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
