New PCA-based Category Encoder for Cybersecurity and Processing Data in IoT Devices
Hamed Farkhari, Joseanne Viana, Luis Miguel Campos, Pedro Sebastiao,, Luis Bernardo

TL;DR
This paper introduces a PCA-based category encoding method that enhances machine learning performance on high-cardinality categorical data, especially in cybersecurity and IoT contexts, by reducing data dimensionality and computational load.
Contribution
A novel PCA-based encoding approach that converts categorical variables into numerical features with minimal data expansion, improving ML accuracy and efficiency in cybersecurity and IoT applications.
Findings
Achieves highest accuracy and AUC compared to 17 encoders.
Reduces number of features while maintaining performance.
Enhances processing time efficiency for IoT devices.
Abstract
Increasing the cardinality of categorical variables might decrease the overall performance of machine learning (ML) algorithms. This paper presents a novel computational preprocessing method to convert categorical to numerical variables ML algorithms. It uses a supervised binary classifier to extract additional context-related features from the categorical values. Up to two numerical variables per categorical variable are created, depending on the compression achieved by the Principal Component Analysis (PCA). The method requires two hyperparameters: a threshold related to the distribution of categories in the variables and the PCA representativeness. This paper applies the proposed approach to the well-known cybersecurity NSLKDD dataset to select and convert three categorical features to numerical features. After choosing the threshold parameter, we use conditional probabilities to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Neural Networks and Applications · Anomaly Detection Techniques and Applications
MethodsPrincipal Components Analysis
