Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image   Classification

Ming-Chang Chiu; Pin-Yu Chen; Xuezhe Ma

arXiv:2212.08649·cs.CV·September 25, 2023

Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification

Ming-Chang Chiu, Pin-Yu Chen, Xuezhe Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how natural background colors as spurious features affect image classification, introduces datasets with annotated backgrounds, and proposes a semantic data augmentation method, FlowAug, to improve subgroup performance and robustness.

Contribution

It provides annotated datasets highlighting background spurious features, introduces FlowAug for semantic data augmentation, and proposes MacroStd as a new metric for model robustness to spurious correlations.

Findings

01

FlowAug improves subgroup consistency and generalization.

02

Background color influences model performance across datasets.

03

MacroStd correlates with improved robustness and subgroup performance.

Abstract

In this paper, we provide 20,000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets. We investigate how natural background colors play a role as spurious features by annotating the test sets of CIFAR10 and CIFAR100 into subgroups based on the background color of each image. We name our datasets \textbf{CIFAR10-B} and \textbf{CIFAR100-B} and integrate them with CIFAR-Cs. We find that overall human-level accuracy does not guarantee consistent subgroup performances, and the phenomenon remains even on models pre-trained on ImageNet or after data augmentation (DA). To alleviate this issue, we propose \textbf{FlowAug}, a \emph{semantic} DA that leverages decoupled semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charismaticchiu/Better-May-Not-Be-Fairer-A-Study-Study-on-Subgroup-Discrepancy-in-Image-Classification
pytorchOfficial

Videos

Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsTest