Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
Chaoqun He, Mingyang Xiang, Yingjing Xu, Bokai Xu, Junbo Cui, Jie Zhou, Yuan Yao, Lijie Wen

TL;DR
Omni-DuplexEval introduces a comprehensive benchmark and automatic evaluation framework for assessing real-time duplex multimodal AI systems across multiple real-world scenarios.
Contribution
The paper presents a new benchmark with 660 videos and an LLM-based evaluation method for real-time duplex multimodal interactions, addressing a key gap in current evaluation practices.
Findings
State-of-the-art models score only 39.6% overall on the benchmark.
Models perform poorly on proactive reminder tasks, scoring only 20%.
Analysis reveals challenges in balancing response timing and content coherence.
Abstract
Real-time duplex interaction is essential for multimodal AI systems operating in real-world scenarios, where models must continuously process streaming inputs and respond at appropriate moments. However, most existing multimodal large language models (MLLMs) are evaluated in offline settings, where the entire video input is processed before any response is generated. While recent work has started to explore real-time duplex MLLMs, there is still no comprehensive benchmark or automatic evaluation method for this setting. To address this gap, we propose Omni-DuplexEval, a benchmark for systematically evaluating real-time duplex interaction. The benchmark consists of two complementary scenarios: (1) Real-Time Description, which evaluates the ability to generate continuous, time-aligned responses that track evolving multimodal inputs, and (2) Proactive Reminder, which evaluates the ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
