# Effective Estimation of Deep Generative Language Models

**Authors:** Tom Pelsmaeker, Wilker Aziz

arXiv: 1904.08194 · 2020-05-05

## TL;DR

This paper reviews and compares methods for effectively estimating deep variational auto-encoders in language modeling, addressing the challenge of posterior collapse with new techniques and practical recommendations.

## Contribution

It provides a comprehensive survey, introduces novel techniques, and offers empirical insights and best practices for training deep probabilistic language models.

## Key findings

- Many techniques perform similarly given sufficient resources
- A convenient technique can be preferred for practical use
- Empirical observations guide best practices in model estimation

## Abstract

Advances in variational inference enable parameterisation of probabilistic models by deep neural networks. This combines the statistical transparency of the probabilistic modelling framework with the representational power of deep learning. Yet, due to a problem known as posterior collapse, it is difficult to estimate such models in the context of language modelling effectively. We concentrate on one such model, the variational auto-encoder, which we argue is an important building block in hierarchical probabilistic models of language. This paper contributes a sober view of the problem, a survey of techniques to address it, novel techniques, and extensions to the model. To establish a ranking of techniques, we perform a systematic comparison using Bayesian optimisation and find that many techniques perform reasonably similar, given enough resources. Still, a favourite can be named based on convenience. We also make several empirical observations and recommendations of best practices that should help researchers interested in this exciting field.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08194/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08194/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/1904.08194/full.md

---
Source: https://tomesphere.com/paper/1904.08194