Classification with Strategically Withheld Data
Anilesh K. Krishnaswamy, Haoming Li, David Rein, Hanrui Zhang, and, Vincent Conitzer

TL;DR
This paper addresses the challenge of strategic feature withholding in classification tasks, proposing three robust methods ({ extsc{Mincut}, { extsc{HC}}, and { extsc{IC-LR}}) to prevent incentives for hiding data and improve classifier robustness.
Contribution
The paper introduces novel classification algorithms that are incentive-compatible, effectively preventing strategic feature withholding and enhancing robustness against such behavior.
Findings
{ extsc{Mincut} is optimal with known data distribution}
{ extsc{HC} is a simpler, convergent hierarchical ensemble}
{ extsc{IC-LR} removes incentives to hide features}
Abstract
Machine learning techniques can be useful in applications such as credit approval and college admission. However, to be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as bad test scores. This is a missing data problem with a twist: which data is missing {\em depends on the chosen classifier}, because the specific classifier is what may create the incentive to withhold certain feature values. We address the problem of training classifiers that are robust to this behavior. We design three classification methods: {\sc Mincut}, {\sc Hill-Climbing} ({\sc HC}) and Incentive-Compatible Logistic Regression ({\sc IC-LR}). We show that {\sc Mincut} is optimal when the true distribution of data is fully known. However, it can produce complex decision boundaries, and hence be prone to overfitting in some cases. Based on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAuction Theory and Applications · Imbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data
MethodsLogistic Regression
