Debugging Machine Learning Tasks

Aleksandar Chakarov; Aditya Nori; Sriram Rajamani; Shayak Sen; and; Deepak Vijaykeerthy

arXiv:1603.07292·cs.LG·March 24, 2016·24 cites

Debugging Machine Learning Tasks

Aleksandar Chakarov, Aditya Nori, Sriram Rajamani, Shayak Sen, and, Deepak Vijaykeerthy

PDF

Open Access

TL;DR

This paper introduces Psi, an automated causality-based tool for debugging data errors in machine learning classification tasks, addressing a gap in existing debugging tools for data rather than code.

Contribution

It proposes a novel method using Pearl's causation theory and probabilistic programming to identify root causes of misclassifications due to data errors in ML tasks.

Findings

01

Psi effectively identifies root causes of data errors.

02

The method leverages Pearl's PS metric for causality analysis.

03

Experimental results show Psi's utility on real datasets.

Abstract

Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their data. However, algorithms and tools for debugging and fixing errors in data are less common, when compared to their counterparts for detecting and fixing errors in code. In this paper, we consider classification tasks where errors in training data lead to misclassifications in test points, and propose an automated method to find the root causes of such misclassifications. Our root cause analysis is based on Pearl's theory of causation, and uses Pearl's PS (Probability of Sufficiency) as a scoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research · Software Reliability and Analysis Research