A Natural Bias for Language Generation Models

Clara Meister; Wojciech Stokowiec; Tiago Pimentel; Lei Yu; Laura; Rimell; Adhiguna Kuncoro

arXiv:2212.09686·cs.CL·June 26, 2023

A Natural Bias for Language Generation Models

Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura, Rimell, Adhiguna Kuncoro

PDF

Open Access

TL;DR

This paper proposes initializing language models with unigram frequency biases to improve learning efficiency and performance, demonstrated through neural machine translation experiments.

Contribution

It introduces a simple method of bias initialization using unigram distributions to enhance language model training.

Findings

01

Improved learning efficiency in neural machine translation.

02

Achieved better overall performance with bias initialization.

03

Encouraged models to focus on non-frequency language aspects.

Abstract

After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target training corpus. The use of such a heuristic raises the question: Can we initialise our models with this behaviour and save precious compute resources and model capacity? Here we show that we can effectively endow standard neural language generation models with a separate module that reflects unigram frequency statistics as prior knowledge, simply by initialising the bias term in a model's final linear layer with the log-unigram distribution. We use neural machine translation as a test bed for this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer