Delving into Semantic Scale Imbalance
Yanbiao Ma, Licheng Jiao, Fang Liu, Yuxin Li, Shuyuan Yang, Xu Liu

TL;DR
This paper introduces the concept of semantic scale imbalance to better understand model bias in long-tailed data, proposing a new measurement and training framework that improves performance across diverse datasets.
Contribution
It defines and quantifies semantic scale imbalance, and develops a semantic-scale-balanced learning method that enhances model performance on various datasets.
Findings
Semantic scale correlates with classification performance.
The proposed method improves results on long-tailed datasets.
Model bias persists even with balanced data, explained by semantic scale imbalance.
Abstract
Model bias triggered by long-tailed data has been widely studied. However, measure based on the number of samples cannot explicate three phenomena simultaneously: (1) Given enough data, the classification performance gain is marginal with additional samples. (2) Classification performance decays precipitously as the number of training samples decreases when there is insufficient data. (3) Model trained on sample-balanced datasets still has different biases for different classes. In this work, we define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes. It is exciting to find experimentally that there is a marginal effect of semantic scale, which perfectly describes the first two phenomena. Further, the quantitative measurement of semantic scale imbalance is proposed, which can accurately reflect model bias on multiple datasets, even on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning in Healthcare · COVID-19 diagnosis using AI
