Knowledge is a Region in Weight Space for Fine-tuned Language Models

Almog Gueta; Elad Venezian; Colin Raffel; Noam Slonim; Yoav Katz,; Leshem Choshen

arXiv:2302.04863·cs.LG·October 16, 2023·1 cites

Knowledge is a Region in Weight Space for Fine-tuned Language Models

Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz,, Leshem Choshen

PDF

Open Access

TL;DR

This paper explores the structure of weight space in fine-tuned language models, revealing that high-performing models occupy well-defined regions, and that traversing these regions can produce models with improved or comparable performance.

Contribution

It introduces the concept of a 'region' in weight space where fine-tuned models reside, and proposes a method for selecting better models based on this insight.

Findings

01

Fine-tuned models form tight clusters in weight space.

02

Traversing regions between models can yield better performance.

03

Starting from the region center improves fine-tuning efficiency.

Abstract

Research on neural networks has focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, particularly those trained or tested on different datasets. We address this by studying how the weight space and the underlying loss landscape of different models are interconnected. Specifically, we demonstrate that finetuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa -- that any model that resides anywhere in those regions also exhibits high performance. Notably, we show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster. Moreover, traversing around the region between the models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)