Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing   System Failure

Besmira Nushi; Ece Kamar; Eric Horvitz

arXiv:1809.07424·cs.LG·September 21, 2018

Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

Besmira Nushi, Ece Kamar, Eric Horvitz

PDF

TL;DR

This paper introduces Pandora, a hybrid human-machine approach for detailed analysis and explanation of system failures in complex AI systems, improving understanding beyond traditional aggregate metrics.

Contribution

Pandora provides a novel set of tools combining human insights and system data to characterize failures in multi-component machine learning systems.

Findings

01

Pandora enables detailed failure analysis in image captioning systems.

02

Case study shows Pandora improves debugging and system understanding.

03

Hybrid methods reveal failure conditions related to input and architecture.

Abstract

As machine learning systems move from computer-science laboratories into the open world, their accountability becomes a high priority problem. Accountability requires deep understanding of system behavior and its failures. Current evaluation methods such as single-score error metrics and confusion matrices provide aggregate views of system performance that hide important shortcomings. Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement. Characterization of failures and shortcomings is particularly complex for systems composed of multiple machine learned components. For such systems, existing evaluation methods have limited expressiveness in describing and explaining the relationship among input content, the internal states…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.