Tending Towards Stability: Convergence Challenges in Small Language   Models

Richard Diehl Martinez; Pietro Lesci; Paula Buttery

arXiv:2410.11451·cs.CL·October 16, 2024

Tending Towards Stability: Convergence Challenges in Small Language Models

Richard Diehl Martinez, Pietro Lesci, Paula Buttery

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates why small language models underperform during late training stages, analyzing how their convergence dynamics differ from larger models and linking these issues to the effective rank of their parameters.

Contribution

The study provides a detailed analysis of training dynamics in small versus large models, highlighting the impact of parameter effective rank on convergence stability.

Findings

01

Larger models stabilize early in training, within 20%.

02

Small models show slower, less stable convergence.

03

Lower effective rank correlates with convergence issues.

Abstract

Increasing the number of parameters in language models is a common strategy to enhance their performance. However, smaller language models remain valuable due to their lower operational costs. Despite their advantages, smaller models frequently underperform compared to their larger counterparts, even when provided with equivalent data and computational resources. Specifically, their performance tends to degrade in the late pretraining phase. This is anecdotally attributed to their reduced representational capacity. Yet, the exact causes of this performance degradation remain unclear. We use the Pythia model suite to analyse the training dynamics that underlie this phenomenon. Across different model sizes, we investigate the convergence of the Attention and MLP activations to their final state and examine how the effective rank of their parameters influences this process. We find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eleutherai/pythia
pytorchOfficial

Videos

Tending Towards Stability: Convergence Challenges in Small Language Models· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Pythia