Enhancing Synthetic Oversampling for Imbalanced Datasets Using Proxima-Orion Neighbors and q-Gaussian Weighting Technique
Pankaj Yadav, Vivek Vijay, and Gulshan Sihag

TL;DR
This paper introduces a novel oversampling method for imbalanced datasets that uses Proxima and Orion neighbors along with q-Gaussian weighting to generate diverse synthetic minority instances, improving classification performance.
Contribution
The paper presents a new oversampling algorithm combining Proxima-Orion neighbor selection and q-Gaussian weighting, enhancing minority class representation in imbalanced datasets.
Findings
The proposed PO-QG algorithm outperforms five existing methods in classification accuracy.
Statistical tests confirm the significance of performance improvements.
Effective on diverse datasets from KEEL and UCI repositories.
Abstract
In this article, we propose a novel oversampling algorithm to increase the number of instances of minority class in an imbalanced dataset. We select two instances, Proxima and Orion, from the set of all minority class instances, based on a combination of relative distance weights and density estimation of majority class instances. Furthermore, the q-Gaussian distribution is used as a weighting mechanism to produce new synthetic instances to improve the representation and diversity. We conduct a comprehensive experiment on 42 datasets extracted from KEEL software and eight datasets from the UCI ML repository to evaluate the usefulness of the proposed (PO-QG) algorithm. Wilcoxon signed-rank test is used to compare the proposed algorithm with five other existing algorithms. The test results show that the proposed technique improves the overall classification performance. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Advanced Statistical Methods and Models · Artificial Intelligence in Healthcare
MethodsSparse Evolutionary Training
