Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study

Stefan Kraft; Andreas Theissler; Vera Wienhausen-Wilke; Gjergji Kasneci; Hendrik Lensch

arXiv:2510.21389·cs.LG·October 27, 2025

Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study

Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Gjergji Kasneci, Hendrik Lensch

PDF

TL;DR

This study evaluates how different types and timings of explainable AI assistance impact clinicians' performance, efficiency, and acceptance in sleep disorder diagnostics, demonstrating that transparent, targeted AI support improves accuracy and user trust.

Contribution

It provides the first application-grounded user study comparing transparent and black-box AI assistance in clinical sleep scoring, highlighting the benefits of targeted, explainable AI interventions.

Findings

01

Transparent AI assistance improves event detection by ~30% over black-box AI.

02

Clinicians prefer transparent AI and find it more trustworthy.

03

Targeted quality-control AI enhances accuracy without significantly increasing scoring time.

Abstract

Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires more than high predictive accuracy. Clinicians must discern \textit{when} and \textit{why} to trust algorithmic recommendations. This work presents an application-grounded user study with eight professional sleep medicine practitioners, who score nocturnal arousal events in polysomnographic data under three conditions: (i) manual scoring, (ii) black-box (BB) AI assistance, and (iii) transparent white-box (WB) AI assistance. Assistance is provided either from the \textit{start} of scoring or as a post-hoc quality-control (\textit{QC}) review. We systematically evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.