PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

Yuanjian Chen; Yang Xiao; Han Yin; Xubo Liu; Jinjie Huang; Ting Dang

arXiv:2603.05128·eess.AS·March 11, 2026

PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

Yuanjian Chen, Yang Xiao, Han Yin, Xubo Liu, Jinjie Huang, Ting Dang

PDF

Open Access

TL;DR

PolyBench is a new benchmark designed to evaluate the ability of Large Audio Language Models to perform compositional reasoning tasks in polyphonic audio, addressing a gap in existing evaluation methods.

Contribution

This work introduces PolyBench, a comprehensive benchmark with five subsets to assess reasoning over multiple concurrent sound events in polyphonic audio.

Findings

01

State-of-the-art LALMs show performance degradation on PolyBench

02

PolyBench reveals fundamental bottlenecks in current LALMs for polyphonic reasoning

03

Benchmark covers counting, classification, detection, concurrency, and duration estimation

Abstract

Large Audio Language Models (LALMs) are increasingly capable of reasoning over audio. However, existing benchmarks provide limited coverage of reasoning in polyphonic audio, where multiple sound events co-occur and induce compositional structure. In this work, we introduce PolyBench, a benchmark designed to evaluate compositional reasoning in polyphonic audio. PolyBench comprises five evaluation subsets covering counting, classification, detection, concurrency, and duration estimation, requiring reasoning over multiple concurrent events and their relations. Evaluation of state-of-the-art LALMs reveals consistent performance degradation in polyphonic audio, indicating a fundamental bottleneck in current LALMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis