Length bias in Encoder Decoder Models and a Case for Global Conditioning
Pavel Sountsov, Sunita Sarawagi

TL;DR
This paper identifies a length bias in encoder-decoder models favoring short sequences, especially with larger beam sizes, and proposes a globally conditioned model to mitigate this bias and improve inference efficiency.
Contribution
The paper reveals the cause of length bias in encoder-decoder models and introduces a globally conditioned approach that reduces bias and eliminates the need for beam search.
Findings
Length bias worsens with increasing beam size.
Globally conditioned model alleviates length bias.
Proposed model enables efficient vector-space search.
Abstract
Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size. In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences. For the case where the predicted sequences come from a closed set, we show that a globally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
