Equally Critical: Samples, Targets, and Their Mappings in Datasets

Runkang Yang; Peng Sun; Xinyi Shang; Yi Tang; Tao Lin

arXiv:2506.01987·cs.LG·June 4, 2025

Equally Critical: Samples, Targets, and Their Mappings in Datasets

Runkang Yang, Peng Sun, Xinyi Shang, Yi Tang, Tao Lin

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the combined influence of samples and targets in datasets on training efficiency, proposing a unified framework and empirical analysis to improve data-driven model training.

Contribution

It introduces a taxonomy of sample-target interactions and a unified loss framework, advancing understanding of their joint impact on training dynamics.

Findings

01

Target and sample variations significantly affect training efficiency

02

The proposed strategies improve model convergence speed

03

Six key insights guide data optimization for training

Abstract

Data inherently possesses dual attributes: samples and targets. For targets, knowledge distillation has been widely employed to accelerate model convergence, primarily relying on teacher-generated soft target supervision. Conversely, recent advancements in data-efficient learning have emphasized sample optimization techniques, such as dataset distillation, while neglected the critical role of target. This dichotomy motivates our investigation into understanding how both sample and target collectively influence training dynamic. To address this gap, we first establish a taxonomy of existing paradigms through the lens of sample-target interactions, categorizing them into distinct sample-to-target mapping strategies. Building upon this foundation, we then propose a novel unified loss framework to assess their impact on training efficiency. Through extensive empirical studies on our…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

Comprehensive Analysis: The paper provides a thorough investigation of how different sample-to-target mappings and data augmentation strategies affect training efficiency, offering valuable insights. Novel Perspective on Targets: By highlighting the often-neglected role of targets in dataset design, the paper contributes to a more holistic understanding of data-efficient learning. Unified Loss Framework: The introduction of a unified loss function that separates the backbone training from the

Weaknesses

Theoretical Analysis: It would be ideal to provide a theoretical framework or intuition to explain the empirical observations, especially concerning why weaker teacher models can aid early learning and why STRATEGY C effectively reduces noise. This addition would be a nice enhancement rather than any requirement, but I am not allowed to leave this section blank.🥸

Reviewer 02Rating 5Confidence 3

Strengths

They pose an interesting question how different target encodings influence training efficiency of neural networks Interesting experiments are designed that investigate questions such as how the quality of labels affects the accuracy of the student during different stages of the training, whether better teacher performance always entails better student performance or the interplay of data augmentation for the student and the teacher. All experiments are repeated at least five times. The paper is

Weaknesses

The experiments fail to consider other possibly relevant factors. For example, it is possible that strategy A in the results from figure 3a) simply needs a different learning rate While the experiments are repeated at least five times, no uncertainty quantification (such as standard error) is included in the plots or the analysis. Research data: No code is provided Experiment results are not included, e.g. as csv files While interesting experiments are designed and phenomena are observed, little

Reviewer 03Rating 3Confidence 3

Strengths

Data and computational efficiency are highly relevant practical problems. As we reach fundamental upper limits on the possible size of training datasets, finding ways to improve the neural scaling laws that have been observed until now will be essential for continuing to improve model capabilities. Thus, the stated problem under study is relevant to the ICLR community.

Weaknesses

The main thrust of the paper is that, in the context of improving neural scaling laws, their "finding underscores the significance of the exploration the target component, a frequently overlooked aspect in the deep learning community." The discussion of neural scaling laws centers of the fact that exponentially larger datasets are needed to achieve only marginal performance improvements; in particular, these scaling laws are a problem only once we have reached the "extremely large dataset regime

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques