Satisfying Real-world Goals with Dataset Constraints

Gabriel Goh; Andrew Cotter; Maya Gupta; Michael Friedlander

arXiv:1606.07558·cs.LG·May 5, 2017·80 cites

Satisfying Real-world Goals with Dataset Constraints

Gabriel Goh, Andrew Cotter, Maya Gupta, Michael Friedlander

PDF

Open Access

TL;DR

This paper introduces a method for training classifiers that simultaneously satisfy multiple real-world goals across different datasets by using dataset constraints and an efficient optimization algorithm.

Contribution

It presents a novel approach combining dataset constraints with ramp penalties to handle multiple goals, optimizing a complex non-convex problem efficiently.

Findings

01

Effective on benchmark datasets

02

Successful in real-world industry applications

03

Outperforms traditional single-goal training methods

Abstract

The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Data Stream Mining Techniques