Data Science Principles for Interpretable and Explainable AI
Kris Sankaran

TL;DR
This paper reviews key principles and techniques for making AI models more interpretable and explainable, emphasizing transparency, user control, and evaluation criteria to address deployment risks.
Contribution
It synthesizes interpretability concepts, connects them to classical principles, illustrates basic techniques, and discusses evaluation and open challenges in explainable AI.
Findings
Explainability techniques like embeddings and integrated gradients are effective.
Audience goals are crucial in designing interpretability methods.
Evaluation criteria help assess interpretability approaches.
Abstract
Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field. We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable models. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
