When Parts Are Greater Than Sums: Individual LLM Components Can   Outperform Full Models

Ting-Yun Chang; Jesse Thomason; Robin Jia

arXiv:2406.13131·cs.CL·October 8, 2024

When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models

Ting-Yun Chang, Jesse Thomason, Robin Jia

PDF

Open Access 1 Repo

TL;DR

This paper investigates the internal components of large language models, revealing that individual parts can outperform the full model on classification tasks, and introduces a reweighting method to enhance performance.

Contribution

It decomposes LLM outputs into components, analyzes their behaviors, and proposes a reweighting technique to improve in-context learning accuracy.

Findings

01

Component accuracies are consistent across prompts and perturbations.

02

Reweighting components improves accuracy by an average of 6.0%.

03

Some components perform well individually despite poor overall model performance.

Abstract

This paper studies in-context learning by decomposing the output of large language models into the individual contributions of attention heads and MLPs (components). We observe curious components: good-performing ones that individually do well on a classification task, even when the model performs poorly; bad-performing ones that do much worse than chance; and label-biased components that always predict the same label. We find that component accuracies are well-correlated across different demonstration sets and perturbations of prompt templates. Based on our findings, we propose component reweighting, which learns to linearly re-scale the component activations from a few labeled examples. Given 24 labeled examples, our method improves by an average of 6.0% accuracy points over 24-shot ICL across 8 tasks on Llama-2-7B. Overall, this paper both enriches our understanding of ICL and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

terarachang/LLMDecomp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need