Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience

Xilin Jiang; Cong Han; Yinghao Aaron Li; and Nima Mesgarani

arXiv:2402.03710·eess.AS·June 12, 2025·2 cites

Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience

Xilin Jiang, Cong Han, Yinghao Aaron Li, and Nima Mesgarani

PDF

Open Access

TL;DR

This paper presents 'Listen, Chat, and Remix' (LCR), a user-friendly system that uses text prompts and large language models to remix sound mixtures by controlling individual sources without source separation.

Contribution

LCR introduces a novel multimodal sound remixing method that interprets text instructions to control multiple sound sources simultaneously within a mixture.

Findings

01

Significant signal quality improvements across remixing tasks

02

Robust zero-shot performance with diverse sound sources

03

Effective semantic filtering based on user prompts

Abstract

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Remix" (LCR), a novel multimodal sound remixer that controls each sound source in a mixture based on user-provided text instructions. LCR distinguishes itself with a user-friendly text interface and its unique ability to remix multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for remixing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles filtered components back to the desired output. We developed a 160-hour dataset with over 100k mixtures, including speech and various audio sources, along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNoise Effects and Management