Extracting Expert's Goals by What-if Interpretable Modeling
Chun-Hao Chang, George Alexandru Adam, Rich Caruana, Anna Goldenberg

TL;DR
This paper introduces a method using interpretable generalized additive models to recover clinicians' reward functions from treatment data, enabling explainable AI in healthcare decision-making.
Contribution
It proposes a novel approach combining what-if reasoning with GAMs to interpret clinician behavior and recover reward functions in healthcare settings.
Findings
Model outperforms baselines in simulation and real data
Explanations align with clinical guidelines
Linear models often contradict clinical best practices
Abstract
Although reinforcement learning (RL) has tremendous success in many fields, applying RL to real-world settings such as healthcare is challenging when the reward is hard to specify and no exploration is allowed. In this work, we focus on recovering clinicians' rewards in treating patients. We incorporate the what-if reasoning to explain the clinician's treatments based on their potential future outcomes. We use generalized additive models (GAMs) - a class of accurate, interpretable models - to recover the reward. In both simulation and a real-world hospital dataset, we show our model outperforms baselines. Finally, our model's explanations match several clinical guidelines when treating patients while we found the commonly-used linear model often contradicts them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
