Abstract Sound Fusion with Unconditional Inversion Models
Jing Liu, Enqi Lian, Moyao Deng

TL;DR
This paper introduces novel inversion models based on SDE and ODE techniques for sound fusion, enabling controllable synthesis of abstract sounds without prompt conditioning, advancing auditory feature manipulation.
Contribution
The paper presents new inversion models using DPMSolver++ samplers that improve sound fusion by removing circular dependencies and allowing flexible guidance without prompts.
Findings
Effective sound fusion with controllable features
No prompt conditioning required for inversion
Improved inversion process using DPMSolver++
Abstract
An abstract sound is defined as a sound that does not disclose identifiable real-world sound events to a listener. Sound fusion aims to synthesize an original sound and a reference sound to generate a novel sound that exhibits auditory features beyond mere additive superposition of the sound constituents. To achieve this fusion, we employ inversion techniques that preserve essential features of the original sample while enabling controllable synthesis. We propose novel SDE and ODE inversion models based on DPMSolver++ samplers that reverse the sampling process by configuring model outputs as constants, eliminating circular dependencies incurred by noise prediction terms. Our inversion approach requires no prompt conditioning while maintaining flexible guidance during sampling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
