Reweighting Improves Conditional Risk Bounds

Yikai Zhang; Jiahe Lin; Fengpei Li; Songzhu Zheng; Anant Raj; Anderson; Schneider; Yuriy Nevmyvaka

arXiv:2501.02353·cs.LG·January 7, 2025

Reweighting Improves Conditional Risk Bounds

Yikai Zhang, Jiahe Lin, Fengpei Li, Songzhu Zheng, Anant Raj, Anderson, Schneider, Yuriy Nevmyvaka

PDF

Open Access

TL;DR

This paper demonstrates that weighted empirical risk minimization, under a Bernstein condition, can outperform standard ERM in specific data regions, especially in classification and heteroscedastic regression, supported by synthetic experiments.

Contribution

It introduces a weighted ERM approach that leverages data-dependent weights to improve risk bounds in certain sub-regions, under a general Bernstein condition.

Findings

01

Weighted ERM achieves better bounds in large-margin classification regions.

02

Weighted ERM improves performance in low-variance heteroscedastic regression.

03

Synthetic data experiments support the theoretical advantages.

Abstract

In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)