Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models
Ashkan Khakzar, Yawei Li, Yang Zhang, Mirac Sanisoglu, Seong Tae Kim,, Mina Rezaei, Bernd Bischl, Nassir Navab

TL;DR
This paper investigates how different methods for addressing data imbalance in medical imaging datasets influence the internal learned features of neural networks, revealing insights beyond standard quantitative metrics.
Contribution
It provides a detailed analysis of how handling data imbalance affects feature representations in neural networks, using multiple interpretability perspectives.
Findings
Handling data imbalance impacts feature saliency and pathology encoding.
Quantitative metrics may not fully capture differences in learned features.
Deeper understanding of model internal representations aids in better model design.
Abstract
One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Medical Coding and Health Information · Machine Learning in Healthcare
