Language Model Decomposition: Quantifying the Dependency and Correlation   of Language Models

Hao Zhang

arXiv:2210.10289·cs.CL·October 24, 2022

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Hao Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a theoretical framework called Language Model Decomposition (LMD) to quantify the linear dependency among pre-trained language models, revealing high correlation and suggesting the need for more diverse models.

Contribution

The paper proposes LMD, a novel method to measure linear dependency among language models, providing a closed-form solution and a goodness-of-fit metric.

Findings

01

BERT and 11 similar LMs are 91% linearly dependent

02

Current SOTA LMs are highly correlated

03

More diverse LMs are needed for progress

Abstract

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haozhg/lmd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Softmax · Adam · Weight Decay · Attention Dropout · Linear Layer · WordPiece · Layer Normalization