Stabilising Explainability Fragility in Cybersecurity AI: The Impact and Mitigation of Multicollinearity in Public Benchmark Datasets
Ioannis J. Vourganas, Anna Lito Michala

TL;DR
This paper explores how multicollinearity affects the stability of AI explanations in intrusion detection systems, introduces a formal theorem, and proposes mitigation methods to improve explanation robustness.
Contribution
It introduces a formal theorem on multicollinearity's impact on explanation variance and proposes novel mitigation techniques for explainability fragility in cybersecurity AI.
Findings
Multicollinearity inflates attribution variance in explanations.
Proposed Explanability Fragility Score quantifies explanation instability.
Mitigation methods improve explanation stability without sacrificing predictive accuracy.
Abstract
This paper investigates a unexplored yet impactful vulnerability in AI explainability used in intrusion detection (IDS): multicollinearity-induced instability. Despite extensive reliance on post-hoc explainability tools such as SHAP or LIME, the impact of correlated features on explanation robustness is not evaluated. We introduce a formal theorem stating that multicollinearity inflates attribution variance. This demonstrates that explanations and feature importances are non-identifiable under multicollinearity. A suite of comprehensive experiments validates the theorem on a representative benchmark dataset, UNSW-NB15. Four widely used families of models are evaluated, including linear, tree-based, kernel, and neural, across full and pruned feature sets based on VIF and correlation thresholding. We propose the novel metric of Explanability Fragility Score and two novel methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
