Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn,   Focus, and Review

Neha Prakriya; Jui-Nan Yen; Cho-Jui Hsieh; Jason Cong

arXiv:2409.06131·cs.CL·January 30, 2025

Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review

Neha Prakriya, Jui-Nan Yen, Cho-Jui Hsieh, Jason Cong

PDF

Open Access 1 Video

TL;DR

This paper introduces the LFR pedagogy, a dynamic training method for large language models that improves learning efficiency and retention by focusing on challenging data regions, reducing training costs significantly.

Contribution

The paper proposes the Learn-Focus-Review paradigm, a novel adaptive training approach that enhances LLM pretraining efficiency by prioritizing difficult data regions based on model performance.

Findings

01

LFR reduces training tokens by up to 19% while maintaining performance.

02

LFR pretrained models outperform baseline models in various tasks.

03

LFR matches or exceeds industry-standard models with fewer training tokens.

Abstract

Traditional Large Language Model (LLM) pretraining relies on autoregressive language modeling with randomly sampled data from web-scale datasets. Inspired by human learning techniques like spaced repetition, we hypothesize that random sampling leads to high training costs, lower-quality models, and significant data forgetting. To address these inefficiencies, we propose the Learn-Focus-Review (LFR) paradigm -- a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset that are more prone to being forgotten, enabling better retention and more efficient learning. Using the LFR paradigm, we pretrained Llama and GPT models on the SlimPajama and OpenWebText datasets, respectively. These models were evaluated on downstream tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · GPT · LLaMA · Pythia · Linear Layer · Multi-Head Attention · Cosine Annealing · Byte Pair Encoding · Softmax