Preference-Based Learning in Audio Applications: A Systematic Analysis
Aaron Broukhim, Yiran Shen, Prithviraj Ammanabrolu, Nadir Weibel

TL;DR
This systematic review highlights the emerging role of preference learning in audio applications, emphasizing recent shifts towards generation tasks and the need for standardized benchmarks and datasets.
Contribution
It provides a comprehensive analysis of the sparse application of preference learning in audio, identifying key patterns and future research directions.
Findings
Preference learning is underutilized in audio, with only 6% of papers applying it.
Post-2021 studies focus on generation tasks using RLHF frameworks.
Multi-stage training pipelines and multi-dimensional evaluation strategies are emerging.
Abstract
Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a PRISMA-guided systematic review of approximately 500 papers, we find that only 30 (6%) apply preference learning to audio tasks. Our analysis reveals a field in transition: pre-2021 works focused on emotion recognition using traditional ranking methods (rankSVM), while post-2021 studies have pivoted toward generation tasks employing modern RLHF frameworks. We identify three critical patterns: (1) the emergence of multi-dimensional evaluation strategies combining synthetic, automated, and human preferences; (2) inconsistent alignment between traditional metrics (WER, PESQ) and human judgments across different contexts; and (3) convergence on multi-stage training pipelines that combine reward signals.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Emotion and Mood Recognition · Neuroscience and Music Perception
