One Prompt, Many Sounds: Modeling Listener Variability in LLM-Based Equalization
Ioannis Stylianou, Jon Francombe, Pablo Martinez-Nuevo, Sven Ewan Shepstone, Zheng-Hua Tan

TL;DR
This paper presents an LLM-based method for natural language control of audio equalization, enabling adaptive, conversational sound tuning that aligns with user preferences and context.
Contribution
It introduces a novel LLM-driven approach for flexible, natural language audio equalization, leveraging data-driven fine-tuning and distributional metrics for preference alignment.
Findings
Statistically significant improvement in preference alignment over baselines
Models reliably match population-preferred equalization settings
Demonstrates feasibility of conversational sound system control
Abstract
Conventional audio equalization is a static process that requires manual and cumbersome adjustments to adapt to changing listening contexts (e.g., mood, location, or social setting). In this paper, we introduce a Large Language Model (LLM)-based alternative that maps natural language text prompts to equalization settings. This enables a conversational approach to sound system control. By utilizing data collected from a controlled listening experiment, our models exploit in-context learning and parameter-efficient fine-tuning techniques to reliably align with population-preferred equalization settings. Our evaluation methods, which leverage distributional metrics that capture users' varied preferences, show statistically significant improvements in distributional alignment over random sampling and static preset baselines. These results indicate that LLMs could function as "artificial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
