Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han,, Zhiyuan Liu, Maosong Sun, Jie Zhou

TL;DR
This paper investigates the geometric connections between different minima of pre-trained language models using mode connectivity, revealing how hyperparameters, training methods, and data influence these connections and what this implies for model adaptation.
Contribution
It provides the first empirical analysis of mode connectivity in PLMs, exploring how various factors affect the geometric relationships between minima during pre-training and adaptation.
Findings
Hyperparameters and tuning methods significantly influence mode connectivity.
Mode connectivity evolves during pre-training, affecting downstream adaptation.
Understanding these connections can improve PLM fine-tuning strategies.
Abstract
Recent years have witnessed the prevalent application of pre-trained language models (PLMs) in NLP. From the perspective of parameter space, PLMs provide generic initialization, starting from which high-performance minima could be found. Although plenty of works have studied how to effectively and efficiently adapt PLMs to high-performance minima, little is known about the connection of various minima reached under different adaptation configurations. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity, which measures whether two minima can be connected with a low-loss path. We conduct empirical analyses to investigate three questions: (1) how could hyperparameters, specific tuning methods, and training data affect PLM's mode connectivity? (2) How does mode connectivity change during pre-training? (3) How does the PLM's task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
