Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI

Monirul Islam Mahmud

arXiv:2505.16103·cs.LG·May 23, 2025

Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI

Monirul Islam Mahmud

PDF

Open Access

TL;DR

This paper evaluates various machine learning and ensemble techniques, combined with feature selection and explainable AI, to improve the accuracy and interpretability of keylogger detection systems using a public dataset.

Contribution

It provides a comprehensive analysis of traditional and ensemble machine learning models, feature selection methods, and explainability techniques for keylogger detection.

Findings

01

AdaBoost achieved 99.76% accuracy and near-perfect classification.

02

Fisher Score combined with ensemble methods yielded the best results.

03

Explainable AI techniques like SHAP and LIME enhanced model interpretability.

Abstract

Keylogger detection involves monitoring for unusual system behaviors such as delays between typing and character display, analyzing network traffic patterns for data exfiltration. In this study, we provide a comprehensive analysis for keylogger detection with traditional machine learning models - SVC, Random Forest, Decision Tree, XGBoost, AdaBoost, Logistic Regression and Naive Bayes and advanced ensemble methods including Stacking, Blending and Voting. Moreover, feature selection approaches such as Information gain, Lasso L1 and Fisher Score are thoroughly assessed to improve predictive performance and lower computational complexity. The Keylogger Detection dataset from publicly available Kaggle website is used in this project. In addition to accuracy-based classification, this study implements the approach for model interpretation using Explainable AI (XAI) techniques namely SHAP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection

MethodsLogistic Regression · Feature Selection · Shapley Additive Explanations · Local Interpretable Model-Agnostic Explanations