Balancing Label Imbalance in Federated Environments Using Only Mixup and   Artificially-Labeled Noise

Kyle Sang; Tahseen Rabbani; Furong Huang

arXiv:2409.13235·cs.LG·September 23, 2024

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

Kyle Sang, Tahseen Rabbani, Furong Huang

PDF

Open Access

TL;DR

This paper proposes a simple augmentation method combining mixup and artificially generated noise to address label imbalance in federated learning, improving model training on skewed datasets.

Contribution

It introduces a novel augmentation approach using pseudo-images, mixup, and natural noise to mitigate label imbalance in federated environments.

Findings

01

Augmentation improves performance on skewed CIFAR-10 and MNIST datasets.

02

Using natural noise with mixup enhances label distribution homogenization.

03

The method is effective with small amounts of augmentation.

Abstract

Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternet Traffic Analysis and Secure E-voting · Power Line Communications and Noise