Certified Computation from Unreliable Datasets

Themis Gouleakis; Christos Tzamos; Manolis Zampetakis

arXiv:1709.03926·cs.GT·June 14, 2018·1 cites

Certified Computation from Unreliable Datasets

Themis Gouleakis, Christos Tzamos, Manolis Zampetakis

PDF

Open Access

TL;DR

This paper introduces a verification-based approach to ensure high-quality learning from unreliable datasets by verifying only a small, critical subset of records, applicable to various optimization problems, including some NP-complete cases.

Contribution

The work presents a generic, instance-optimal verification method that certifies correctness with few checks, applicable to a broad class of functions satisfying Lipschitz conditions.

Findings

01

Few verifications suffice for high-accuracy guarantees.

02

The method applies even to some NP-complete problems.

03

Invalid records are identified and removed to ensure accuracy.

Abstract

A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. Our method, identifies small sets of critical records and verifies their validity. We show that many problems only need $poly (1/ ε)$ verifications, to ensure that the output of the computation is at most a factor of $(1 \pm ε)$ away from the truth. For any given instance, we provide an \textit{instance optimal} solution that verifies the minimum possible number of records to approximately certify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference