Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

Chaoqun He; Mingyang Xiang; Yingjing Xu; Bokai Xu; Junbo Cui; Jie Zhou; Yuan Yao; Lijie Wen

arXiv:2605.17360·cs.CV·May 19, 2026

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

Chaoqun He, Mingyang Xiang, Yingjing Xu, Bokai Xu, Junbo Cui, Jie Zhou, Yuan Yao, Lijie Wen

PDF

1 Repo 1 Datasets

TL;DR

Omni-DuplexEval introduces a comprehensive benchmark and automatic evaluation framework for assessing real-time duplex multimodal AI systems across multiple real-world scenarios.

Contribution

The paper presents a new benchmark with 660 videos and an LLM-based evaluation method for real-time duplex multimodal interactions, addressing a key gap in current evaluation practices.

Findings

01

State-of-the-art models score only 39.6% overall on the benchmark.

02

Models perform poorly on proactive reminder tasks, scoring only 20%.

03

Analysis reveals challenges in balancing response timing and content coherence.

Abstract

Real-time duplex interaction is essential for multimodal AI systems operating in real-world scenarios, where models must continuously process streaming inputs and respond at appropriate moments. However, most existing multimodal large language models (MLLMs) are evaluated in offline settings, where the entire video input is processed before any response is generated. While recent work has started to explore real-time duplex MLLMs, there is still no comprehensive benchmark or automatic evaluation method for this setting. To address this gap, we propose Omni-DuplexEval, a benchmark for systematically evaluating real-time duplex interaction. The benchmark consists of two complementary scenarios: (1) Real-Time Description, which evaluates the ability to generate continuous, time-aligned responses that track evolving multimodal inputs, and (2) Proactive Reminder, which evaluates the ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openbmb/Omni-DuplexEval
github

Datasets

Hothan/Omni-DuplexEval
dataset· 329 dl
329 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.