Simi-SFX: A similarity-based conditioning method for controllable sound effect synthesis
Yunyi Liu, Craig Jin

TL;DR
This paper introduces Simi-SFX, a novel similarity-based conditioning method for controllable sound effect synthesis that leverages differentiable digital signal processing and pre-trained audio models to achieve fine-grained timbre control.
Contribution
It proposes a new similarity-based conditioning framework combining DDSP and latent space control, enabling expressive sound effect synthesis with subtle timbral variations.
Findings
Effective control of timbre variations demonstrated
Enables timbre interpolation between classes
Benchmark datasets validate controllability and sound quality
Abstract
Generating sound effects with controllable variations is a challenging task, traditionally addressed using sophisticated physical models that require in-depth knowledge of signal processing parameters and algorithms. In the era of generative and large language models, text has emerged as a common, human-interpretable interface for controlling sound synthesis. However, the discrete and qualitative nature of language tokens makes it difficult to capture subtle timbral variations across different sounds. In this research, we propose a novel similarity-based conditioning method for sound synthesis, leveraging differentiable digital signal processing (DDSP). This approach combines the use of latent space for learning and controlling audio timbre with an intuitive guiding vector, normalized within the range [0,1], to encode categorical acoustic information. By utilizing pre-trained audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Acoustic Wave Phenomena Research · Music and Audio Processing
