Equally Critical: Samples, Targets, and Their Mappings in Datasets
Runkang Yang, Peng Sun, Xinyi Shang, Yi Tang, Tao Lin

TL;DR
This paper investigates the combined influence of samples and targets in datasets on training efficiency, proposing a unified framework and empirical analysis to improve data-driven model training.
Contribution
It introduces a taxonomy of sample-target interactions and a unified loss framework, advancing understanding of their joint impact on training dynamics.
Findings
Target and sample variations significantly affect training efficiency
The proposed strategies improve model convergence speed
Six key insights guide data optimization for training
Abstract
Data inherently possesses dual attributes: samples and targets. For targets, knowledge distillation has been widely employed to accelerate model convergence, primarily relying on teacher-generated soft target supervision. Conversely, recent advancements in data-efficient learning have emphasized sample optimization techniques, such as dataset distillation, while neglected the critical role of target. This dichotomy motivates our investigation into understanding how both sample and target collectively influence training dynamic. To address this gap, we first establish a taxonomy of existing paradigms through the lens of sample-target interactions, categorizing them into distinct sample-to-target mapping strategies. Building upon this foundation, we then propose a novel unified loss framework to assess their impact on training efficiency. Through extensive empirical studies on our…
Peer Reviews
Decision·Submitted to ICLR 2025
Comprehensive Analysis: The paper provides a thorough investigation of how different sample-to-target mappings and data augmentation strategies affect training efficiency, offering valuable insights. Novel Perspective on Targets: By highlighting the often-neglected role of targets in dataset design, the paper contributes to a more holistic understanding of data-efficient learning. Unified Loss Framework: The introduction of a unified loss function that separates the backbone training from the
Theoretical Analysis: It would be ideal to provide a theoretical framework or intuition to explain the empirical observations, especially concerning why weaker teacher models can aid early learning and why STRATEGY C effectively reduces noise. This addition would be a nice enhancement rather than any requirement, but I am not allowed to leave this section blank.🥸
They pose an interesting question how different target encodings influence training efficiency of neural networks Interesting experiments are designed that investigate questions such as how the quality of labels affects the accuracy of the student during different stages of the training, whether better teacher performance always entails better student performance or the interplay of data augmentation for the student and the teacher. All experiments are repeated at least five times. The paper is
The experiments fail to consider other possibly relevant factors. For example, it is possible that strategy A in the results from figure 3a) simply needs a different learning rate While the experiments are repeated at least five times, no uncertainty quantification (such as standard error) is included in the plots or the analysis. Research data: No code is provided Experiment results are not included, e.g. as csv files While interesting experiments are designed and phenomena are observed, little
Data and computational efficiency are highly relevant practical problems. As we reach fundamental upper limits on the possible size of training datasets, finding ways to improve the neural scaling laws that have been observed until now will be essential for continuing to improve model capabilities. Thus, the stated problem under study is relevant to the ICLR community.
The main thrust of the paper is that, in the context of improving neural scaling laws, their "finding underscores the significance of the exploration the target component, a frequently overlooked aspect in the deep learning community." The discussion of neural scaling laws centers of the fact that exponentially larger datasets are needed to achieve only marginal performance improvements; in particular, these scaling laws are a problem only once we have reached the "extremely large dataset regime
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques
