TED: Accelerate Model Training by Internal Generalization

Jinying Xiao; Ping Li; Jie Nie

arXiv:2405.03228·cs.LG·September 16, 2025

TED: Accelerate Model Training by Internal Generalization

Jinying Xiao, Ping Li, Jie Nie

PDF

Open Access

TL;DR

TED pruning introduces a novel internal generalization measure to efficiently prune large language models, enabling significant dataset compression while maintaining performance.

Contribution

The paper proposes TED pruning, a new method leveraging Internal Generalization Distance to improve model pruning efficiency and reduce training costs.

Findings

01

Achieves lossless performance with 60-70% data using TED.

02

Validates IGD as an effective proxy for generalization.

03

Demonstrates applicability across image, language understanding, and LLM fine-tuning.

Abstract

Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes. We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data while fitting retained data, known as Internal Generalization (IG). TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization. The IGD optimization objective was verified to allow the model to achieve the smallest upper bound on generalization error. The impact of small mask fluctuations on IG is studied through masks and Taylor approximation, and fast estimation of IGD is enabled. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Advanced Data Processing Techniques

MethodsALIGN · Pruning