Landscape-Aware Growing: The Power of a Little LAG

Stefani Karp; Nikunj Saunshi; Sobhan Miryoosefi; Sashank J. Reddi,; Sanjiv Kumar

arXiv:2406.02469·cs.LG·June 5, 2024

Landscape-Aware Growing: The Power of a Little LAG

Stefani Karp, Nikunj Saunshi, Sobhan Miryoosefi, Sashank J. Reddi,, Sanjiv Kumar

PDF

Open Access

TL;DR

This paper introduces a landscape-aware growing (LAG) approach that uses early training dynamics to better select optimal model growing strategies, improving efficiency in training Transformer models.

Contribution

It proposes a novel perspective based on early training behavior, moving beyond initialization metrics, and develops an adaptive strategy for model stacking.

Findings

01

Early training performance correlates better with final results than initialization metrics.

02

LAG enables more accurate prediction of optimal growing strategies.

03

Adaptive stacking strategies improve training efficiency.

Abstract

Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have extensively focused on loss- and/or function-preserving behavior at initialization or simply performance at the end of training. Instead, we identify that behavior at initialization can be misleading as a predictor of final performance and present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)". We perform extensive analysis of correlation of the final performance with performance in the initial steps of training and find early and more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning