DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets
Shubham Shrivastava, Xianling Zhang, Sushruth Nagesh, Armin Parchami

TL;DR
This paper introduces a novel approach to address data imbalance in machine learning by leveraging deep perceptual embeddings and clustering to weigh samples during training, significantly improving detection of under-represented classes.
Contribution
It proposes a new method using image appearance-based likelihoods and a Generalized Focal Loss to enhance model performance on imbalanced datasets, especially for rare classes.
Findings
Over 200% AP gains on under-represented classes in KITTI.
Effective across autonomous driving datasets like KITTI and nuScenes.
Enhances state-of-the-art 3D object detection methods.
Abstract
Data imbalance is a well-known issue in the field of machine learning, attributable to the cost of data collection, the difficulty of labeling, and the geographical distribution of the data. In computer vision, bias in data distribution caused by image appearance remains highly unexplored. Compared to categorical distributions using class labels, image appearance reveals complex relationships between objects beyond what class labels provide. Clustering deep perceptual features extracted from raw pixels gives a richer representation of the data. This paper presents a novel method for addressing data imbalance in machine learning. The method computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering. It then uses these likelihoods to weigh samples differently during training with a proposed function. This loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Automated Road and Building Extraction · Domain Adaptation and Few-Shot Learning
