SUNAC: Source-aware Unified Neural Audio Codec

Ryo Aihara; Yoshiki Masuyama; Francesco Paissan; Fran\c{c}ois G. Germain; Gordon Wichern; Jonathan Le Roux

arXiv:2511.16126·eess.AS·November 21, 2025

SUNAC: Source-aware Unified Neural Audio Codec

Ryo Aihara, Yoshiki Masuyama, Francesco Paissan, Fran\c{c}ois G. Germain, Gordon Wichern, Jonathan Le Roux

PDF

Open Access

TL;DR

SUNAC is a neural audio codec that encodes individual sources from mixtures based on source prompts, enabling efficient source-specific processing with competitive quality and lower computational cost.

Contribution

It introduces a source-aware neural audio codec that encodes sources directly from mixtures conditioned on prompts, improving flexibility and efficiency.

Findings

01

Achieves competitive resynthesis quality

02

Enables user-driven source selection

03

Reduces computational cost compared to cascaded methods

Abstract

Neural audio codecs (NACs) provide compact representations that can be leveraged in many downstream applications, in particular large language models. Yet most NACs encode mixtures of multiple sources in an entangled manner, which may impede efficient downstream processing in applications that need access to only a subset of the sources (e.g., analysis of a particular type of sound, transcription of a given speaker, etc). To address this, we propose a source-aware codec that encodes individual sources directly from mixtures, conditioned on source type prompts. This enables user-driven selection of which source(s) to encode, including separately encoding multiple sources of the same type (e.g., multiple speech signals). Experiments show that our model achieves competitive resynthesis and separation quality relative to a cascade of source separation followed by a conventional NAC, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis