A Novel Data Pre-processing Technique: Making Data Mining Robust to Different Units and Scales of Measurement
Arbind Agrahari Baniya, Sunil Aryal, Santosh KC

TL;DR
This paper introduces ARES, a new data pre-processing method based on ensemble ranks over sub-samples, which enhances robustness to units and scales in data mining tasks like classification and anomaly detection.
Contribution
The paper proposes ARES, an innovative pre-processing technique that improves data normalization by combining ranks over multiple sub-samples, outperforming traditional methods in consistency and effectiveness.
Findings
ARES yields more consistent outcomes across algorithms and datasets.
ARES outperforms min-max normalization and traditional rank transformation.
ARES provides better or comparable results in classification and anomaly detection.
Abstract
Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and effective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task specific? outcome across various algorithms and data sets. In addition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Network Security and Intrusion Detection
