Communicating Sound Through Natural Language
Emanuele Rossi, Emanuele Rodol\`a

TL;DR
This paper introduces lexical acoustic coding (LAC), a novel framework where pre-trained language models communicate sound through natural language, enabling interpretable and editable audio transmission.
Contribution
The paper presents LAC, a new method for transmitting audio via natural language using interpretable acoustic descriptors and a shared vocabulary, bridging audio and language models.
Findings
Plain text preserves measurable acoustic structure.
LAC enables interpretable and editable sound communication.
Trade-offs exist between vocabulary size, rate, and fidelity.
Abstract
Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
