Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization
Sumedh Rasal

TL;DR
This paper presents Predictive Batch Scheduling (PBS), a lightweight method that accelerates language model training by dynamically prioritizing high-loss samples using a simple online predictor based on token-level features.
Contribution
Introducing PBS, a novel, low-overhead technique that uses a lightweight predictor to estimate sample difficulty and improve training efficiency for language models.
Findings
PBS achieves 6-13% faster convergence in language model training.
The predictor correlates with actual loss at 0.44 using only four features.
Token frequency statistics effectively encode sample difficulty.
Abstract
We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard example mining methods that demand expensive per-sample loss tracking, PBS employs a lightweight linear predictor trained online to estimate sample difficulty from static token-level features. Our predictor achieves 0.44 correlation with actual loss using only four simple features: token frequency, sequence length, vocabulary diversity, and rare token ratio. Experiments on a 130M parameter transformer demonstrate that PBS achieves 6-13\% faster convergence measured by evaluation loss across training checkpoints, with the predictor's correlation improving from 0.14 to 0.44 over 10,000 training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Data Classification · Topic Modeling
