ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence
Sangshin Oh, Seyun Um, Hong-Goo Kang

TL;DR
This paper introduces ReCAB, an analytic divergence metric for relaxed categorical distributions, enabling more stable and efficient variational autoencoders, demonstrated through improved emotional speech synthesis.
Contribution
The paper proposes ReCAB, a closed-form divergence metric for relaxed categorical distributions, and integrates it into a VAE for better modeling of continuous and discrete latent variables.
Findings
ReCAB closely approximates the true KLD.
ReCAB-VAE improves speech quality in emotion control.
The framework offers stable training compared to stochastic methods.
Abstract
The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB), a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution. The proposed metric is easy to implement because it has a closed form solution, and empirical results show that it is close to the actual KLD. Along with this new metric, we propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed discrete latent representations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
