Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models
Hao Zhang

TL;DR
This paper introduces a theoretical framework called Language Model Decomposition (LMD) to quantify the linear dependency among pre-trained language models, revealing high correlation and suggesting the need for more diverse models.
Contribution
The paper proposes LMD, a novel method to measure linear dependency among language models, providing a closed-form solution and a goodness-of-fit metric.
Findings
BERT and 11 similar LMs are 91% linearly dependent
Current SOTA LMs are highly correlated
More diverse LMs are needed for progress
Abstract
Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Softmax · Adam · Weight Decay · Attention Dropout · Linear Layer · WordPiece · Layer Normalization
