Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning
Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, Wei-Lun Chao

TL;DR
This paper investigates the over-fitting of ConvNets on minor classes in imbalanced data, identifies feature deviation as a key issue, and proposes class-dependent temperatures to improve minority class recognition.
Contribution
It introduces the concept of feature deviation in imbalanced deep learning and proposes class-dependent temperatures to mitigate its effects during training.
Findings
ConvNets over-fit minor classes, unlike traditional algorithms.
Feature deviation causes test features to shift away from training features.
Proposed CDT method improves minority class performance on benchmarks.
Abstract
Classifiers trained with class-imbalanced data are known to perform poorly on test data of the "minor" classes, of which we have insufficient training data. In this paper, we investigate learning a ConvNet classifier under such a scenario. We found that a ConvNet significantly over-fits the minor classes, which is quite opposite to traditional machine learning algorithms that often under-fit minor classes. We conducted a series of analysis and discovered the feature deviation phenomenon -- the learned ConvNet generates deviated features between the training and test data of minor classes -- which explains how over-fitting happens. To compensate for the effect of feature deviation which pushes test data toward low decision value regions, we propose to incorporate class-dependent temperatures (CDT) in training a ConvNet. CDT simulates feature deviation in the training phase, forcing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Electricity Theft Detection Techniques · Infrastructure Maintenance and Monitoring
MethodsTest · Logistic Regression
