On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems
Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann

TL;DR
This paper presents a human-in-the-loop troubleshooting methodology for complex machine learning systems, enabling better error diagnosis and system improvement guidance through simulated fixes and human computation tasks.
Contribution
It introduces a novel approach that leverages human input to identify and fix errors in integrative machine learning pipelines, improving troubleshooting effectiveness.
Findings
Effective identification of failure points in ML pipelines
Guidance for system designers on optimal improvements
Successful application to real-world image captioning system
Abstract
We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components. Understanding and fixing errors that arise in such integrative systems is difficult as failures can occur at multiple points in the execution workflow. Moreover, errors can propagate, become amplified or be suppressed, making blame assignment difficult. We propose a human-in-the-loop methodology which leverages human intellect for troubleshooting system failures. The approach simulates potential component fixes through human computation tasks and measures the expected improvements in the holistic behavior of the system. The method provides guidance to designers about how they can best improve the system. We demonstrate the effectiveness of the approach on an automated image captioning system that has been pressed into real-world use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
