Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering
Chih-Ming Huang, Chun-Hung Lin, Chuan-Sheng Hung, Wun-Hui Zeng, You-Cheng Zheng, Chih-Min Tsai

TL;DR
This paper introduces a new clustering method to improve classification accuracy in imbalanced datasets, particularly useful in medical diagnosis and similar fields.
Contribution
The novel LBNN algorithm uses KMOR for outlier detection and improves performance on imbalanced data.
Findings
The LBNN algorithm outperforms traditional models in precision, recall, and G-means on imbalanced datasets.
Experiments on KEEL datasets and real medical data confirm the effectiveness of the proposed method.
Replacing the inter-quartile range with KMOR improves outlier identification in one-class problems.
Abstract
Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quartile range mechanism is replaced with the K-means with outlier removal (KMOR) algorithm for efficient outlier identification in the target class. Parameters are optimized by treating these outliers as non-target-class samples. A new algorithm, the Location-based Nearest-Neighbor (LBNN) algorithm, clusters one-class training data using KMOR and calculates the farthest distance and percentile for each test data point to determine if it belongs to the target class. Experiments cover parameter studies, validation on eight standard imbalanced datasets from KEEL, and three applications on real…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLiterature, Culture, and Criticism · Cultural, Media, and Literary Studies · Linguistics and Education Research
