FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness
Hossam Amer, Maryam Dialameh, Hossein Rajabzadeh, Walid Ahmed, Weiwei Zhang, Yang Liu

TL;DR
This paper introduces a test-time compute-aware training method with early stopping that reduces training FLOPs by up to 92% while maintaining or improving model accuracy, balancing training and inference costs.
Contribution
It proposes a novel early stopping algorithm based on test-time compute awareness, with an efficient evaluation method and a formal bound to optimize training compute without losing accuracy.
Findings
Up to 92% reduction in training FLOPs achieved.
Maintains or improves accuracy with fewer training resources.
Provides a practical approach for balancing training and inference compute.
Abstract
Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
