HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques
Angelos Chatzimparmpas, Fernando V. Paulovich, Andreas Kerren

TL;DR
HardVis is a visual analytics tool that helps users manage instance hardness in imbalanced datasets by enabling informed sampling decisions to improve machine learning model performance.
Contribution
This paper introduces HardVis, a novel visual analytics system that facilitates the selection and validation of undersampling and oversampling techniques based on instance hardness in imbalanced classification.
Findings
HardVis effectively helps users identify and sample difficult instances.
The system improves model accuracy by balancing data distribution.
User feedback indicates high usability and usefulness of HardVis.
Abstract
Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real-world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as an efficient solution. However, the problem is more fundamental, with many works emphasizing the importance of instance hardness. This issue refers to the significance of managing unsafe or potentially noisy instances that are more likely to be misclassified and serve as the root cause of poor classification performance. This paper introduces HardVis, a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Our proposed system assists users in visually comparing different distributions of data types, selecting types of instances based on local characteristics that will later be affected by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
