TL;DR
This paper introduces a scalable hierarchical sampling training algorithm for factorized hierarchical variational autoencoders, enabling effective training on large-scale datasets for improved disentangled representation learning.
Contribution
The paper proposes a novel hierarchical sampling training algorithm that enhances scalability, reduces runtime and memory issues, and improves hyperparameter optimization for FHVAE models.
Findings
Effective training on datasets from 3 to 1,000 hours.
Models demonstrate improved disentanglement and interpretability.
Visualization method aids qualitative evaluation.
Abstract
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations. Among them, a factorized hierarchical variational autoencoder (FHVAE) is a variational inference-based model that formulates a hierarchical generative process for sequential data. Specifically, an FHVAE model can learn disentangled and interpretable representations, which have been proven useful for numerous speech applications, such as speaker verification, robust speech recognition, and voice conversion. However, as we will elaborate in this paper, the training algorithm proposed in the original paper is not scalable to datasets of thousands of hours, which makes this model less applicable on a larger scale. After identifying limitations in terms of runtime, memory, and hyperparameter optimization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability · Solana Customer Service Number +1-833-534-1729
