SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction

Tuochao Chen; D Shin; Hakan Erdogan; Sinan Hersek

arXiv:2506.00273·eess.AS·June 3, 2025

SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction

Tuochao Chen, D Shin, Hakan Erdogan, Sinan Hersek

PDF

Open Access

TL;DR

SoundSculpt is a neural network that enhances target sound extraction from ambisonic recordings by integrating spatial cues and semantic information, outperforming traditional methods in complex acoustic scenarios.

Contribution

The paper introduces a novel ambisonic neural network architecture conditioned on spatial and semantic data, improving target sound extraction from ambisonic recordings.

Findings

01

Spatial and semantic conditioning improves extraction accuracy.

02

Semantic embeddings from text descriptions enhance performance.

03

SoundSculpt outperforms baseline signal processing methods.

Abstract

This paper introduces SoundSculpt, a neural network designed to extract target sound fields from ambisonic recordings. SoundSculpt employs an ambisonic-in-ambisonic-out architecture and is conditioned on both spatial information (e.g., target direction obtained by pointing at an immersive video) and semantic embeddings (e.g., derived from image segmentation and captioning). Trained and evaluated on synthetic and real ambisonic mixtures, SoundSculpt demonstrates superior performance compared to various signal processing baselines. Our results further reveal that while spatial conditioning alone can be effective, the combination of spatial and semantic information is beneficial in scenarios where there are secondary sound sources spatially close to the target. Additionally, we compare two different semantic embeddings derived from a text description of the target sound using text encoders.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies