Robust empirical risk minimization via Newton's method

Eirini Ioannou; Muni Sreenivas Pydi; Po-Ling Loh

arXiv:2301.13192·stat.ML·July 18, 2023

Robust empirical risk minimization via Newton's method

Eirini Ioannou, Muni Sreenivas Pydi, Po-Ling Loh

PDF

TL;DR

This paper introduces a robust Newton's method for empirical risk minimization that uses robust estimators for gradients and Hessians, achieving faster convergence in high-dimensional and contaminated data scenarios.

Contribution

It develops a new robust Newton's method with convergence guarantees and practical algorithms for high-dimensional, contaminated, or heavy-tailed data.

Findings

01

Proves convergence of the robust Newton method to a small neighborhood of the true minimizer.

02

Demonstrates quadratic convergence rates similar to classical Newton's method under robustness.

03

Provides an algorithm suitable for high-dimensional problems with contaminated data.

Abstract

A new variant of Newton's method for empirical risk minimization is studied, where at each iteration of the optimization algorithm, the gradient and Hessian of the objective function are replaced by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, consequences of the theory in generalized linear models are studied when data are generated from Huber's epsilon-contamination model and/or heavytailed distributions. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed, which may be more appropriate for high-dimensional settings, and conjectures about the convergence of the resulting algorithm are offered. Compared to robust gradient descent, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.