Abstract Sound Fusion with Unconditional Inversion Models

Jing Liu; Enqi Lian; Moyao Deng

arXiv:2506.11811·cs.SD·August 5, 2025

Abstract Sound Fusion with Unconditional Inversion Models

Jing Liu, Enqi Lian, Moyao Deng

PDF

Open Access

TL;DR

This paper introduces novel inversion models based on SDE and ODE techniques for sound fusion, enabling controllable synthesis of abstract sounds without prompt conditioning, advancing auditory feature manipulation.

Contribution

The paper presents new inversion models using DPMSolver++ samplers that improve sound fusion by removing circular dependencies and allowing flexible guidance without prompts.

Findings

01

Effective sound fusion with controllable features

02

No prompt conditioning required for inversion

03

Improved inversion process using DPMSolver++

Abstract

An abstract sound is defined as a sound that does not disclose identifiable real-world sound events to a listener. Sound fusion aims to synthesize an original sound and a reference sound to generate a novel sound that exhibits auditory features beyond mere additive superposition of the sound constituents. To achieve this fusion, we employ inversion techniques that preserve essential features of the original sample while enabling controllable synthesis. We propose novel SDE and ODE inversion models based on DPMSolver++ samplers that reverse the sampling process by configuring model outputs as constants, eliminating circular dependencies incurred by noise prediction terms. Our inversion approach requires no prompt conditioning while maintaining flexible guidance during sampling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing