Communicating Sound Through Natural Language

Emanuele Rossi; Emanuele Rodol\`a

arXiv:2605.08750·cs.LG·May 12, 2026

Communicating Sound Through Natural Language

Emanuele Rossi, Emanuele Rodol\`a

PDF

TL;DR

This paper introduces lexical acoustic coding (LAC), a novel framework where pre-trained language models communicate sound through natural language, enabling interpretable and editable audio transmission.

Contribution

The paper presents LAC, a new method for transmitting audio via natural language using interpretable acoustic descriptors and a shared vocabulary, bridging audio and language models.

Findings

01

Plain text preserves measurable acoustic structure.

02

LAC enables interpretable and editable sound communication.

03

Trade-offs exist between vocabulary size, rate, and fidelity.

Abstract

Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.