Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation
Gautham Mysore (Adobe Systems), Maneesh Sahani (University College, London)

TL;DR
This paper introduces a Bayesian variational inference method for non-negative factorial hidden Markov models, significantly improving computational efficiency in audio source separation while maintaining comparable accuracy.
Contribution
A novel variational inference algorithm for N-FHMM that reduces complexity from exponential to linear in the number of sources, enabling faster audio separation.
Findings
Achieves around 30x speedup over exact inference.
Performs comparably to original N-FHMM in separation quality.
Complexity is linear in the number of sound sources.
Abstract
The past decade has seen substantial work on the use of non-negative matrix factorization and its probabilistic counterparts for audio source separation. Although able to capture audio spectral structure well, these models neglect the non-stationarity and temporal dynamics that are important properties of audio. The recently proposed non-negative factorial hidden Markov model (N-FHMM) introduces a temporal dimension and improves source separation performance. However, the factorial nature of this model makes the complexity of inference exponential in the number of sound sources. Here, we present a Bayesian variant of the N-FHMM suited to an efficient variational inference algorithm, whose complexity is linear in the number of sound sources. Our algorithm performs comparably to exact inference in the original N-FHMM but is significantly faster. In typical configurations of the N-FHMM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Music and Audio Processing
