Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study
Yves Ruffenach

TL;DR
This paper empirically analyzes the impact of latent autoregression in GP-VAE language models, showing it enhances long-range stability and structure in the latent space compared to non-autoregressive variants.
Contribution
It provides a systematic ablation study demonstrating the benefits of latent autoregression in GP-VAE models for language processing.
Findings
Latent autoregression improves long-horizon stability.
Removing autoregression degrades latent structure.
Latent autoregression complements token-level autoregressive models.
Abstract
This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an autoregressive factorization over tokens. In contrast, our prior work proposed shifting sequential structure to the latent space through a causal Gaussian process, while using a non-autoregressive decoder. Here, we conduct a systematic ablation study of the role played by latent autoregression. We compare (i) a full GP-VAE model with autoregressive latent dynamics, (ii) a non-autoregressive ablation in which latent variables are independent, and (iii) a standard token-level autoregressive Transformer. Our results show that, within the considered regime (medium-scale corpora and short training contexts), latent autoregression induces latent trajectories that are significantly more compatible with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language and cultural evolution · Natural Language Processing Techniques
