Ranking XAI Methods for Head and Neck Cancer Outcome Prediction
Baoqiang Ma, Djennifer K. Madzia-Madzou, Rosa C.J. Kraaijveld, Jin Ouyang

TL;DR
This paper systematically evaluates 13 explainable AI methods for head and neck cancer outcome prediction, highlighting their strengths and weaknesses across multiple metrics to guide clinical interpretability.
Contribution
It provides the first comprehensive ranking of XAI techniques for HNC prognosis, emphasizing the need for multi-metric evaluation in medical AI interpretability.
Findings
Integrated Gradients and DeepLIFT ranked highest in faithfulness and plausibility.
Large variation observed among XAI methods across different evaluation metrics.
Evaluation framework can be extended to other medical imaging tasks.
Abstract
For head and neck cancer (HNC) patients, prognostic outcome prediction can support personalized treatment strategy selection. Improving prediction performance of HNC outcomes has been extensively explored by using advanced artificial intelligence (AI) techniques on PET/CT data. However, the interpretability of AI remains a critical obstacle for its clinical adoption. Unlike previous HNC studies that empirically selected explainable AI (XAI) techniques, we are the first to comprehensively evaluate and rank 13 XAI methods across 24 metrics, covering faithfulness, robustness, complexity and plausibility. Experimental results on the multi-center HECKTOR challenge dataset show large variations across evaluation aspects among different XAI methods, with Integrated Gradients (IG) and DeepLIFT (DL) consistently obtained high rankings for faithfulness, complexity and plausibility. This work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
