Classification with Strategically Withheld Data

Anilesh K. Krishnaswamy; Haoming Li; David Rein; Hanrui Zhang; and; Vincent Conitzer

arXiv:2012.10203·cs.LG·January 15, 2021

Classification with Strategically Withheld Data

Anilesh K. Krishnaswamy, Haoming Li, David Rein, Hanrui Zhang, and, Vincent Conitzer

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper addresses the challenge of strategic feature withholding in classification tasks, proposing three robust methods ({ extsc{Mincut}, { extsc{HC}}, and { extsc{IC-LR}}) to prevent incentives for hiding data and improve classifier robustness.

Contribution

The paper introduces novel classification algorithms that are incentive-compatible, effectively preventing strategic feature withholding and enhancing robustness against such behavior.

Findings

01

{ extsc{Mincut} is optimal with known data distribution}

02

{ extsc{HC} is a simpler, convergent hierarchical ensemble}

03

{ extsc{IC-LR} removes incentives to hide features}

Abstract

Machine learning techniques can be useful in applications such as credit approval and college admission. However, to be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as bad test scores. This is a missing data problem with a twist: which data is missing {\em depends on the chosen classifier}, because the specific classifier is what may create the incentive to withhold certain feature values. We address the problem of training classifiers that are robust to this behavior. We design three classification methods: {\sc Mincut}, {\sc Hill-Climbing} ({\sc HC}) and Incentive-Compatible Logistic Regression ({\sc IC-LR}). We show that {\sc Mincut} is optimal when the true distribution of data is fully known. However, it can produce complex decision boundaries, and hence be prone to overfitting in some cases. Based on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoming-codes/str-withheld
noneOfficial

Videos

Classification with Strategically withheld Data· underline

Taxonomy

TopicsAuction Theory and Applications · Imbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data

MethodsLogistic Regression