Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need
Mohammadreza Sharifi, Ahad Harati

TL;DR
This paper introduces GSTDS, a spectral analysis-based data selection algorithm that dynamically reduces training data, leading to faster training and improved accuracy with limited computational resources.
Contribution
GSTDS is a novel spectral analysis-driven data curation method that enhances training efficiency and accuracy compared to existing approaches.
Findings
GSTDS reduces computational costs by up to four times.
GSTDS outperforms standard training and JEST on image classification benchmarks.
GSTDS improves accuracy under limited computational resources.
Abstract
Effective data curation is essential for optimizing neural network training. In this paper, we present the Guided Spectrally Tuned Data Selection (GSTDS) algorithm, which dynamically adjusts the subset of data points used for training using an off-the-shelf pre-trained reference model. Based on a pre-scheduled filtering ratio, GSTDS effectively reduces the number of data points processed per batch. The proposed method ensures an efficient selection of the most informative data points for training while avoiding redundant or less beneficial computations. Preserving data points in each batch is performed based on spectral analysis. A Fiedler vector-based scoring mechanism removes the filtered portion of the batch, lightening the resource requirements of the learning. The proposed data selection approach not only streamlines the training process but also promotes improved generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
