An Adversarial Approach for Explainable AI in Intrusion Detection Systems
Daniel L. Marino, Chathurika S. Wickramasinghe, Milos Manic

TL;DR
This paper introduces an adversarial explanation method for understanding misclassifications in intrusion detection systems, enhancing interpretability without altering existing classifiers.
Contribution
It presents a novel adversarial approach to generate explanations for IDS misclassifications that is model-agnostic and easily extendable for further system analysis.
Findings
Generated explanations match expert knowledge
Applicable to any gradient-based classifier
Visualizations improve interpretability
Abstract
Despite the growing popularity of modern machine learning techniques (e.g. Deep Neural Networks) in cyber-security applications, most of these models are perceived as a black-box for the user. Adversarial machine learning offers an approach to increase our understanding of these models. In this paper we present an approach to generate explanations for incorrect classifications made by data-driven Intrusion Detection Systems (IDSs). An adversarial approach is used to find the minimum modifications (of the input features) required to correctly classify a given set of misclassified samples. The magnitude of such modifications is used to visualize the most relevant features that explain the reason for the misclassification. The presented methodology generated satisfactory explanations that describe the reasoning behind the mis-classifications, with descriptions that match expert knowledge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Advanced Malware Detection Techniques
MethodsInterpretability
