ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

Menghe Ma; Siqing Wei; Yuecheng Xing; Yaheng Wang; Fanhong Meng; Peijun Han; Luu Anh Tuan; Haoran Luo

arXiv:2604.20719·cs.SD·April 23, 2026

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

Menghe Ma, Siqing Wei, Yuecheng Xing, Yaheng Wang, Fanhong Meng, Peijun Han, Luu Anh Tuan, Haoran Luo

PDF

TL;DR

ONOTE is a comprehensive benchmark designed to evaluate omnimodal notation processing in music AI, addressing current fragmentation and biases, and revealing gaps in perceptual accuracy versus musical understanding.

Contribution

The paper introduces ONOTE, a deterministic, multi-format benchmark for rigorous evaluation of omnimodal music notation models, highlighting reasoning gaps in current approaches.

Findings

01

Current models show a disconnect between perceptual accuracy and music-theoretic understanding.

02

ONOTE provides a standardized, bias-free evaluation framework for diverse notation systems.

03

Evaluation exposes fundamental reasoning vulnerabilities in leading omnimodal models.

Abstract

Omnimodal Notation Processing (ONP) represents a unique frontier for omnimodal AI due to the rigorous, multi-dimensional alignment required across auditory, visual, and symbolic domains. Current research remains fragmented, focusing on isolated transcription tasks that fail to bridge the gap between superficial pattern recognition and the underlying musical logic. This landscape is further complicated by severe notation biases toward Western staff and the inherent unreliability of "LLM-as-a-judge" metrics, which often mask structural reasoning failures with systemic hallucinations. To establish a more rigorous standard, we introduce ONOTE, a multi-format benchmark that utilizes a deterministic pipeline--grounded in canonical pitch projection--to eliminate subjective scoring biases across diverse notation systems. Our evaluation of leading omnimodal models exposes a fundamental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.