Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise
Kyle Sang, Tahseen Rabbani, Furong Huang

TL;DR
This paper proposes a simple augmentation method combining mixup and artificially generated noise to address label imbalance in federated learning, improving model training on skewed datasets.
Contribution
It introduces a novel augmentation approach using pseudo-images, mixup, and natural noise to mitigate label imbalance in federated environments.
Findings
Augmentation improves performance on skewed CIFAR-10 and MNIST datasets.
Using natural noise with mixup enhances label distribution homogenization.
The method is effective with small amounts of augmentation.
Abstract
Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Power Line Communications and Noise
