UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Chen Chen; ZeYang Hu; Fengjiao Chen; Liya Ma; Jiaxing Liu; Xiaoyu Li; Ziwen Wang; Xuezhi Cao; Xunliang Cai

arXiv:2510.18915·cs.CL·October 31, 2025

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

Chen Chen, ZeYang Hu, Fengjiao Chen, Liya Ma, Jiaxing Liu, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai

PDF

1 Models 1 Datasets

TL;DR

This paper introduces UNO-Bench, a comprehensive benchmark for evaluating the relationship and capabilities of uni-modal and omni-modal models across diverse tasks, revealing insights into their compositional performance and potential bottlenecks.

Contribution

The paper presents UNO-Bench, a unified benchmark with new datasets and evaluation methods for assessing uni-modal and omni-modal models, facilitating understanding of their compositional abilities.

Findings

01

Omni-modal performance acts as a bottleneck in weak models.

02

Strong models show synergistic improvements with omni-modal capabilities.

03

The benchmark covers 44 task types and 5 modality combinations.

Abstract

Multimodal Large Languages models have been progressing from uni-modal understanding toward unifying visual, audio and language modalities, collectively termed omni models. However, the correlation between uni-modal and omni-modal remains unclear, which requires comprehensive evaluation to drive omni model's intelligence evolution. In this work, we introduce a novel, high-quality, and UNified Omni model benchmark, UNO-Bench. This benchmark is designed to effectively evaluate both UNi-modal and Omni-modal capabilities under a unified ability taxonomy, spanning 44 task types and 5 modality combinations. It includes 1250 human curated samples for omni-modal with 98% cross-modality solvability, and 2480 enhanced uni-modal samples. The human-generated dataset is well-suited to real-world scenarios, particularly within the Chinese context, whereas the automatically compressed dataset offers a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AGI-Eval/UNO-Scorer-Qwen3-14B
model· 19 dl· ♡ 8
19 dl♡ 8

Datasets

meituan-longcat/UNO-Bench
dataset· 2.0k dl
2.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.