Analysing Dropout and Compounding Errors in Neural Language Models
James O' Neill, Danushka Bollegala

TL;DR
This paper empirically evaluates various dropout techniques in neural language models, introduces extensions with varying schedules, and analyzes their impact on reducing compounding errors across different models and datasets.
Contribution
It proposes extended dropout methods with varying schedules and provides a detailed analysis of their effectiveness in mitigating compounding errors in language models.
Findings
Variational curriculum dropout with a linear schedule performs best.
Dropout on the decoder layer yields the largest performance gains.
Proposed methods reduce errors related to compounding in language modeling.
Abstract
This paper carries out an empirical analysis of various dropout techniques for language modelling, such as Bernoulli dropout, Gaussian dropout, Curriculum Dropout, Variational Dropout and Concrete Dropout. Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules. We find these extensions to perform well when compared to standard dropout approaches, particularly variational curriculum dropout with a linear schedule. Largest performance increases are made when applying dropout on the decoder layer. Lastly, we analyze where most of the errors occur at test time as a post-analysis step to determine if the well-known problem of compounding errors is apparent and to what end do the proposed methods mitigate this issue for each dataset. We report results on a 2-hidden layer LSTM, GRU and Highway network with embedding dropout,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning
MethodsConcrete Dropout · Sigmoid Activation · Tanh Activation · Highway Layer · Highway Network · Long Short-Term Memory · Variational Dropout · Dropout
