Expressive Range Characterization of Open Text-to-Audio Models

Jonathan Morse; Azadeh Naderi; Swen Gaudl; Mark Cartwright; Amy K. Hoover; Mark J. Nelson

arXiv:2510.27102·cs.SD·November 17, 2025

Expressive Range Characterization of Open Text-to-Audio Models

Jonathan Morse, Azadeh Naderi, Swen Gaudl, Mark Cartwright, Amy K. Hoover, Mark J. Nelson

PDF

Open Access

TL;DR

This paper adapts expressive range analysis to evaluate the variability and fidelity of open text-to-audio models by analyzing their outputs across key acoustic dimensions for standardized prompts.

Contribution

It introduces a framework for applying expressive range analysis to text-to-audio models, enabling systematic evaluation of their output diversity and quality.

Findings

01

Analyzed model outputs along acoustic dimensions like pitch and timbre.

02

Demonstrated variability in generated audio for fixed prompts.

03

Provided a new evaluation framework for generative audio models.

Abstract

Text-to-audio models are a type of generative model that produces audio output in response to a given textual prompt. Although level generators and the properties of the functional content that they create (e.g., playability) dominate most discourse in procedurally generated content (PCG), games that emotionally resonate with players tend to weave together a range of creative and multimodal content (e.g., music, sounds, visuals, narrative tone), and multimodal models have begun seeing at least experimental use for this purpose. However, it remains unclear what exactly such models generate, and with what degree of variability and fidelity: audio is an extremely broad class of output for a generative system to target. Within the PCG community, expressive range analysis (ERA) has been used as a quantitative way to characterize generators' output space, especially for level generators.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Artificial Intelligence in Games · Music Technology and Sound Studies