Empirical Similarity for Absent Data Generation in Imbalanced Classification
Arash Pourhabib

TL;DR
This paper introduces SBIC, a novel empirical similarity-based method that models absent data to improve classification in highly imbalanced datasets, outperforming existing techniques in some cases.
Contribution
SBIC is a new approach that simultaneously learns similarity weights and absent data locations, integrating synthetic data generation without modifying the original dataset.
Findings
SBIC performs comparably to existing methods on imbalanced datasets.
In some cases, SBIC outperforms traditional classification techniques.
SBIC effectively handles the imbalance by modeling absent minority class data.
Abstract
When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the training data into account, SBIC utilizes the concept of absent data, i.e. data from the minority class which can help better find the boundary between the two classes. SBIC simultaneously optimizes the weights of the empirical similarity function and finds the locations of absent data points. As such, SBIC uses an embedded mechanism for synthetic data generation which does not modify the training dataset, but alters the algorithm to suit imbalanced datasets. Therefore, SBIC uses the ideas of both major…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
