Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
Everlyn Asiko Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer,, Bruce A. Bassett, Sara Hooker

TL;DR
This paper introduces CAT, a data pruning method that uses early training dynamics to select relevant data points, significantly reducing training data size while maintaining translation quality.
Contribution
The paper presents a novel data pruning technique, Checkpoints Across Time (CAT), leveraging early training signals to improve efficiency in neural machine translation.
Findings
CAT outperforms existing pruning methods on multiple language pairs.
Pruning up to 50% of data with minimal performance loss.
Selected data tends to include longer, rarer sentences.
Abstract
Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT), that leverages early model training dynamics to identify the most relevant data points for model performance. We benchmark CAT against several data pruning techniques including COMET-QE, LASER and LaBSE. We find that CAT outperforms the benchmarks on Indo-European languages on multiple test sets. When applied to English-German, English-French and English-Swahili translation tasks, CAT achieves comparable performance to using the full dataset, while pruning up to 50% of training data. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistics Education and Methodologies
MethodsPruning
