Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI
Monirul Islam Mahmud

TL;DR
This paper evaluates various machine learning and ensemble techniques, combined with feature selection and explainable AI, to improve the accuracy and interpretability of keylogger detection systems using a public dataset.
Contribution
It provides a comprehensive analysis of traditional and ensemble machine learning models, feature selection methods, and explainability techniques for keylogger detection.
Findings
AdaBoost achieved 99.76% accuracy and near-perfect classification.
Fisher Score combined with ensemble methods yielded the best results.
Explainable AI techniques like SHAP and LIME enhanced model interpretability.
Abstract
Keylogger detection involves monitoring for unusual system behaviors such as delays between typing and character display, analyzing network traffic patterns for data exfiltration. In this study, we provide a comprehensive analysis for keylogger detection with traditional machine learning models - SVC, Random Forest, Decision Tree, XGBoost, AdaBoost, Logistic Regression and Naive Bayes and advanced ensemble methods including Stacking, Blending and Voting. Moreover, feature selection approaches such as Information gain, Lasso L1 and Fisher Score are thoroughly assessed to improve predictive performance and lower computational complexity. The Keylogger Detection dataset from publicly available Kaggle website is used in this project. In addition to accuracy-based classification, this study implements the approach for model interpretation using Explainable AI (XAI) techniques namely SHAP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection
MethodsLogistic Regression · Feature Selection · Shapley Additive Explanations · Local Interpretable Model-Agnostic Explanations
