Detecting Cybersecurity Threats by Integrating Explainable AI with SHAP Interpretability and Strategic Data Sampling
Norrakith Srisumrith, Sunantha Sodsee

TL;DR
This paper presents an integrated XAI framework for cybersecurity threat detection that combines strategic data sampling, automated data leakage prevention, and SHAP-based interpretability to improve transparency, efficiency, and trustworthiness.
Contribution
It introduces a novel framework that unifies sampling, data integrity, and explainability techniques for more trustworthy AI in cybersecurity.
Findings
Maintains detection accuracy while reducing computational costs.
Provides clear, actionable explanations for security analysts.
Ensures data integrity through automated leakage prevention.
Abstract
The critical need for transparent and trustworthy machine learning in cybersecurity operations drives the development of this integrated Explainable AI (XAI) framework. Our methodology addresses three fundamental challenges in deploying AI for threat detection: handling massive datasets through Strategic Sampling Methodology that preserves class distributions while enabling efficient model development; ensuring experimental rigor via Automated Data Leakage Prevention that systematically identifies and removes contaminated features; and providing operational transparency through Integrated XAI Implementation using SHAP analysis for model-agnostic interpretability across algorithms. Applied to the CIC-IDS2017 dataset, our approach maintains detection efficacy while reducing computational overhead and delivering actionable explanations for security analysts. The framework demonstrates that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques
