Certifying Data-Bias Robustness in Linear Regression

Anna P. Meyer; Aws Albarghouthi; Loris D'Antoni

arXiv:2206.03575·cs.LG·June 9, 2022

Certifying Data-Bias Robustness in Linear Regression

Anna P. Meyer, Aws Albarghouthi, Loris D'Antoni

PDF

Open Access

TL;DR

This paper introduces methods to certify whether linear regression models are robust to label biases in training data, providing exact and scalable approximate techniques, and evaluates their effectiveness across datasets.

Contribution

The paper presents the first exact and scalable approximate methods for certifying pointwise robustness of linear models to label bias in training data.

Findings

01

Linear models often show high bias-robustness.

02

Gaps in robustness exist under certain bias assumptions.

03

The approach guides trust in model outputs.

Abstract

Datasets typically contain inaccuracies due to human error and societal biases, and these inaccuracies can affect the outcomes of models trained on such datasets. We present a technique for certifying whether linear regression models are pointwise-robust to label bias in the training dataset, i.e., whether bounded perturbations to the labels of a training dataset result in models that change the prediction of test points. We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method that does not require advance knowledge of the test point. We extensively evaluate both techniques and find that linear models -- both regression- and classification-based -- often display high levels of bias-robustness. However, we also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsLinear Regression