Distribution-Independent Regression for Generalized Linear Models with   Oblivious Corruptions

Ilias Diakonikolas; Sushrut Karmalkar; Jongho Park; Christos Tzamos

arXiv:2309.11657·cs.DS·September 29, 2023

Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions

Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos

PDF

Open Access

TL;DR

This paper introduces the first algorithms for robust regression of generalized linear models (GLMs) in the presence of additive oblivious noise, capable of handling more than half the samples being arbitrarily corrupted.

Contribution

It provides a distribution-independent algorithm for GLM regression with oblivious noise, including conditions for identifiability and a method to find accurate solutions or candidate lists.

Findings

01

Algorithm handles more than half samples corrupted

02

Provides necessary and sufficient conditions for identifiability

03

First to address GLMs with oblivious noise beyond linear regression

Abstract

We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g (w^{*} \cdot x)$ . In particular, \new{the noisy labels are of the form} $y = g (w^{*} \cdot x) + ξ + ϵ$ , where $ξ$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $Pr [ξ = 0] \geq o (1)$ , and $ϵ \sim N (0, σ^{2})$ . Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g (w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g (w^{*} \cdot x)$ , rather than the noisy measurements $y$ . We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Stochastic Gradient Optimization Techniques

MethodsGLM