Improved Variational Inference in Discrete VAEs using Error Correcting Codes
Mar\'ia Mart\'inez-Garc\'ia, Grace Villacr\'es, David Mitchell, Pablo M. Olmos

TL;DR
This paper introduces a novel approach to improve discrete Variational Autoencoders by integrating Error-Correcting Codes, enhancing inference accuracy, generation quality, and uncertainty calibration through a communication system perspective.
Contribution
It proposes using Error-Correcting Codes in discrete VAEs to reduce the variational gap and improve inference, a novel perspective in deep probabilistic modeling.
Findings
Significant improvements in generation quality and data reconstruction.
Enhanced uncertainty calibration in discrete VAEs.
Outperforms models trained with tighter bounds like IWAE.
Abstract
Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem through a generative perspective. We conceptualize the model as a communication system, and propose to leverage Error-Correcting Codes (ECCs) to introduce redundancy in latent representations, allowing the variational posterior to produce more accurate estimates and reduce the variational gap. We present a proof-of-concept using a Discrete Variational Autoencoder with binary latent variables and low-complexity repetition codes, extending it to a hierarchical structure for disentangling global and local data features. Our approach significantly improves generation quality, data reconstruction, and uncertainty calibration, outperforming the uncoded models…
Peer Reviews
Decision·UAI 2025 Poster
Novelty: The paper is novel in it borrows the method of Error-Correcting Codes (ECC) and introduces it into the problem of DVAE. The motivation is to safeguard the latent variables by adding redundancy to the latent representations through the error correction method in information theory. In this way, the proposed method claims that it is able to utilizes the redundancy introduced by the ECC to constrain the solution space of $q(c|x)$, therefore leading to a better reconstruction signal to noi
Clarity and rigour: I am a lost in section 4, where how the $m$ is sampled and encoded into the required $c$ is introduced. It seems to me that c is now a linear transformation of $m$ through the known $G$. However, how c is sampled and backpropogated through the DVAE is unclear to me. This section also says: `When comparing uncoded vs. coded DVAEs, the structure of the decoder NN is equal in both cases except for the first Multilayer Perceptron (MLP) layer that attacks the input z'. This is ver
The authors present a theoretically sound and practically useful enhancement to discrete VAEs, noting its practical utility in digital communication. The paper is well-presented; it is easy to follow the contribution from the problems and weaknesses of contemporary approaches to the proposed solution. The experiments validate the theoretical claims, including the claim that the proposed method can be used alongside existing enhancements like IWAEs.
Although the particular enhancement presented is novel, the motivation behind has been thoroughly studied within the context of VAEs in previous papers, specifically in relation to lossy/lossless compression and communication [1, 2]. The paper mentions Vector Quantized VAEs as an existing approach only briefly, failing to discuss the significance of it in relation to the aforementioned overlapping subjects. It would be helpful to discuss the general relationship between compression and variation
- The contribution is novel and interesting. - The paper is well-written, and mathematics is easy to follow. - Introducing ECCs results in significant performance improvement in DVAEs across several datasets as demonstrated in the paper.
While I appreciate the contribution and the presented results in the paper, there are few concerns I have which are listed below: - Comparison baselines: The baseline used for comparison is just DVAE with a simple independent prior. It would better highlight the effectiveness of the proposed approach if the comparison is also done with DVAEs with Boltzmann machine priors and other discrete VAE models such as VQ-VAE (discussed in the introduction section). - The ablation study is conducted by ad
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Engineering Research · Machine Learning and Algorithms
MethodsVariational Inference
