The Importance of Generation Order in Language Modeling
Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, and George E. Dahl

TL;DR
This paper investigates how the order of token generation affects language model quality, introducing a two-pass model that improves performance by strategically ordering function and content words.
Contribution
It proposes a novel two-pass language model and demonstrates that generation order significantly impacts model quality, highlighting the importance of token sequencing strategies.
Findings
Generating function words first improves model quality.
Large variation in performance based on generation order.
Supports further research into generation order optimization.
Abstract
Neural language models are a critical component of state-of-the-art systems for machine translation, summarization, audio transcription, and other tasks. These language models are almost universally autoregressive in nature, generating sentences one token at a time from left to right. This paper studies the influence of token generation order on model quality via a novel two-pass language model that produces partially-filled sentence "templates" and then fills in missing tokens. We compare various strategies for structuring these two passes and observe a surprisingly large variation in model quality. We find the most effective strategy generates function words in the first pass followed by content words in the second. We believe these experimental results justify a more extensive investigation of generation order for neural language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
