A Generative Model for Natural Sounds Based on Latent Force Modelling
William J. Wilkinson, Joshua D. Reiss, Dan Stowell

TL;DR
This paper introduces a probabilistic generative model for natural sounds that incorporates physical knowledge of amplitude envelope behavior, leading to more realistic sound synthesis compared to existing methods.
Contribution
It extends latent force modeling by explicitly capturing correlations over multiple time steps, improving the physical interpretability and realism of sound generation.
Findings
Generated sounds are perceived as more realistic.
Model outperforms NMF-based models in perception despite higher reconstruction error.
Incorporating physical priors enhances the interpretability of latent functions.
Abstract
Recent advances in analysis of subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitudes to be a crucial component of perception. Probabilistic latent variable analysis is particularly revealing, but existing approaches don't incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay and feedback. We use latent force modelling, a probabilistic learning paradigm that incorporates physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling correlations over multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment which shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Music and Audio Processing · Speech and Audio Processing
