Learning from Data with Heterogeneous Noise using SGD
Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate

TL;DR
This paper investigates how stochastic gradient descent can effectively learn from data sources with varying noise levels, proposing adaptive learning rate strategies and providing theoretical regret bounds, with experiments demonstrating improved performance over traditional methods.
Contribution
It introduces a method to adapt the learning rate in SGD based on heterogeneous noise levels and provides theoretical regret bounds for this approach.
Findings
Adaptive learning rate improves performance with heterogeneous noise.
The order of data source usage in SGD depends on the learning rate.
Experiments show the method outperforms fixed-rate approaches in real data scenarios.
Abstract
We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent
