TL;DR
CAESynth is a real-time audio synthesizer using a conditional autoencoder that enables smooth timbre interpolation and independent pitch control, suitable for musical and environmental sound applications.
Contribution
It introduces a novel conditional autoencoder framework with adversarial regularization for stable timbre interpolation and pitch control in real-time audio synthesis.
Findings
Achieves smooth, high-fidelity real-time audio synthesis
Enables independent control of timbre and pitch
Effective for musical cues and environmental sound exploration
Abstract
In this paper, we present a novel audio synthesizer, CAESynth, based on a conditional autoencoder. CAESynth synthesizes timbre in real-time by interpolating the reference sounds in their shared latent feature space, while controlling a pitch independently. We show that training a conditional autoencoder based on accuracy in timbre classification together with adversarial regularization of pitch content allows timbre distribution in latent space to be more effective and stable for timbre interpolation and pitch conditioning. The proposed method is applicable not only to creation of musical cues but also to exploration of audio affordance in mixed reality based on novel timbre mixtures with environmental sounds. We demonstrate by experiments that CAESynth achieves smooth and high-fidelity audio synthesis in real-time through timbre interpolation and independent yet accurate pitch control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
