Training Neural Networks on Data Sources with Unknown Reliability
Alexander Capstick, Francesca Palermo, Tianyu Cui, Payam Barnaghi

TL;DR
This paper proposes a dynamic re-weighting strategy for training neural networks on multiple data sources with unknown reliability, improving performance by adjusting focus based on estimated data quality.
Contribution
It introduces a likelihood tempering inspired method to adaptively weight data sources during training, addressing unknown data reliability in supervised learning.
Findings
Significant performance improvements on mixed reliable and unreliable data sources.
Maintains high accuracy when trained solely on reliable sources.
Effective in diverse experimental settings.
Abstract
When data is generated by multiple sources, conventional training methods update models assuming equal reliability for each source and do not consider their individual data quality. However, in many applications, sources have varied levels of reliability that can have negative effects on the performance of a neural network. A key issue is that often the quality of the data for individual sources is not known during training. Previous methods for training models in the presence of noisy data do not make use of the additional information that the source label can provide. Focusing on supervised learning, we aim to train neural networks on each data source for a number of steps proportional to the source's estimated reliability by using a dynamic re-weighting strategy motivated by likelihood tempering. This way, we allow training on all sources during the warm-up and reduce learning on…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
* The paper is overall organized well. * The authors conducted extensive experiments to verify the effectiveness of their method.
* The method needs further clarification. I am looking at Section 3, the first two subsections (Tempered likelihood and source reliability estimation). For the first subsection, the introduction of $f(C_s)$ as "temperature" is a little bit confusing here. From my understanding, temperature is usually used to scale the logits, while $f(C_s)$ here is more of like used to scale the NLL loss. * There is a similar clarification issue for the second subsection: I wasn't able to follow the "Intuitive
1. The paper addresses the issue of how to make learning algorithms more robust when some data sources in the dataset are unreliable, which is meaningful for the application of models in real-world scenarios. 2. The experiments validate the proposed method on datasets of varying sizes, proving that the method can handle unreliable data to a certain extent.
1. The method proposed in this paper is too trivial. The method uses a reweighting mechanism to control the contribution of different data sources to model training, which is essentially a special case of instance reweighting. The reweighting method proposed in this paper does not offer much novelty compared to previous work, aside from being set-wise than instance-wise. 2. The core part of the proposed method is the SOURCE RELIABILITY ESTIMATION section, which directly uses the historical outp
The method shows reasonable empirical performance on constructed multi-source training problems.
- The writing quality can be improved. Examples of lines which I find hard to follow: 091-098; - Notation could be made more consistent, e.g. in (1) $p(D\mid \theta)$ is used, then after (2) $p_{\theta}(D)$ is used. Do we need the $f(.)$? - The method is basically tempering the losses from different sources to compute the total loss. Transformation in equation (1) is only loosely motivated. Do we really need to make the assumption? Also the assumption that neural networks achieve a lower empiri
1. The learning scenario proposed in this paper is of significant practical value. The modeling of data sources here shows some parallels with domain generalization, and is a useful complement to federated learning and learning from noisy data. 2. The paper provides a comprehensive empirical analysis for the proposed algorithm, validating its performance across a wide range of applications.
1. The main contribution of this work lies in introducing a new OOD/robust learning task. However, the problem lacks a clear mathematical formulation. While the "Motivation for New Methods" section provides an analysis and explanation of the problem, a formal proposition explicitly describing the task, learning objective, and corresponding notations would strengthen the paper. 2. The theoretical modeling of the method is somewhat simplistic. For example, in "Tempered Likelihood," data sources a
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Data Stream Mining Techniques
