Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training   with Syntactic Smoothing

Richard Diehl Martinez; Zebulon Goriely; Andrew Caines; Paula Buttery,; Lisa Beinborn

arXiv:2410.11462·cs.CL·October 16, 2024

Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

Richard Diehl Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery,, Lisa Beinborn

PDF

Open Access 1 Video

TL;DR

This paper proposes Syntactic Smoothing, a method to reduce frequency bias and anisotropy in language models by incorporating syntactic priors, leading to improved handling of infrequent tokens and more balanced representations.

Contribution

Introduces a novel Syntactic Smoothing technique that adjusts the training objective to mitigate frequency bias and anisotropy in language models.

Findings

01

Reduced frequency bias improves performance on rare tokens.

02

Decreased anisotropy correlates with lower frequency bias.

03

Syntactic Smoothing enhances representational diversity.

Abstract

Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence, language models tend to not generalize well to tokens that are seldom seen during training. Moreover, maximum likelihood training has been discovered to give rise to anisotropy: representations of tokens in a model tend to cluster tightly in a high-dimensional cone, rather than spreading out over their representational capacity. Our work introduces a method for quantifying the frequency bias of a language model by assessing sentence-level perplexity with respect to token-level frequency. We then present a method for reducing the frequency bias of a language model by inducing a syntactic prior over token representations during pre-training. Our Syntactic Smoothing method adjusts the maximum likelihood objective function to distribute the learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques