Stable Anisotropic Regularization

William Rudman; Carsten Eickhoff

arXiv:2305.19358·cs.CL·April 5, 2024·2 cites

Stable Anisotropic Regularization

William Rudman, Carsten Eickhoff

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces I-STAR, a novel regularization method that accurately measures and adjusts isotropy in LLM embeddings, showing that decreasing isotropy can enhance model performance.

Contribution

The paper proposes I-STAR, a differentiable and stable isotropy measure, and demonstrates that reducing isotropy improves NLP model performance, challenging previous assumptions.

Findings

01

Decreasing isotropy improves task performance.

02

IsoScore* is a reliable measure of isotropy.

03

Regularization with I-STAR enhances training stability.

Abstract

Given the success of Large Language Models (LLMs), there has been considerable interest in studying the properties of model activations. The literature overwhelmingly agrees that LLM representations are dominated by a few "outlier dimensions" with exceedingly high variance and magnitude. Several studies in Natural Language Processing (NLP) have sought to mitigate the impact of such outlier dimensions and force LLMs to be isotropic (i.e., have uniform variance across all dimensions in embedding space). Isotropy is thought to be a desirable property for LLMs that improves model performance and more closely aligns textual representations with human intuition. However, many of the claims regarding isotropy in NLP have been based on the average cosine similarity of embeddings, which has recently been shown to be a flawed measure of isotropy. In this paper, we propose I-STAR: IsoScore*-based…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 2

Strengths

This paper challenges the dominant belief in NLP literature showing that anisotropy is beneficial. Its findings have the potential to significantly influence future research directions in the field. It also introduces new way to compute the Isotropy in models. The authors conducted a set of experiments to show its efficiency comparing it to CosReg.

Weaknesses

If we see a trend in Figure 3 on how higher IsoScore* leads to lower accuracy, some correlation and significance score should be added to support this claim.

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

The paper is well written and clear. The proposed improvement of IsoScore into IsoScore* is fairly straightforward. The fact that it is a more accurate and more convenient estimate of isotropy in LLMs is argued very well and supported by some empirical results. The paper convincingly argues that isotropy and its impact on performance are not properly understood in NLP, which is a very significant contribution. Experimental results mostly support the arguments in the paper (more comments below)

Weaknesses

* There is no significance testing on the results (Table 1) but there are error bars (good!) — these seem to indicate that most differences outlined are hardly significant (e.g. RTE, 72.56+/-1.29 vs 71.34+/-0.91). This makes it difficult to get a clear picture of the resulting effect of decreasing isotropy. * Similarly Fig. 3 is difficult to interpret — there are clear decreasing trends in some plots, not so much in most of them.

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

- In the NLP field, the isotropy of internal representations was believed to be the key to model success. The message of this paper, suggesting that the **an**isotropy of internal representations might be the key to performance improvement, will likely resonate intriguingly with many readers. - The paper comprehensively covers a collection of works related to the isotropy of NLP models, making it a highly self-contained piece for readers.

Weaknesses

### 1. The reasons for contradictions with prior research are unclear, weakening the persuasiveness of the main claim. The authors' main claim that "**an**isotropy is the key to model performance improvement" isn't reconciled with prior research which posits that "isotropy is the key to model performance improvement". While the authors suggest that the discrepancy arises from the evaluation metrics used (as stated “previous studies have made claims using “flawed” measures of isotropy,” on page 7

Code & Models

Repositories

bcbi-edu/p_eickhoff_isoscore
pytorchOfficial

Videos

Stable Anisotropic Regularization· slideslive

Taxonomy

TopicsNumerical methods in inverse problems · Advanced Measurement and Metrology Techniques · Topology Optimization in Engineering