Reducing Class-Wise Performance Disparity via Margin Regularization

Beier Zhu; Kesen Zhao; Jiequan Cui; Qianru Sun; Yuan Zhou; Xun Yang; Hanwang Zhang

arXiv:2602.00205·cs.LG·February 3, 2026

Reducing Class-Wise Performance Disparity via Margin Regularization

Beier Zhu, Kesen Zhao, Jiequan Cui, Qianru Sun, Yuan Zhou, Xun Yang, Hanwang Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MR$^2$, a theoretically grounded regularization method that dynamically adjusts margins in neural networks to reduce class-wise accuracy disparities, especially improving performance on hard classes without sacrificing overall accuracy.

Contribution

The paper proposes a novel margin regularization technique, MR$^2$, with a theoretical analysis and practical implementation that effectively reduces class-wise performance disparity in neural networks.

Findings

01

MR$^2$ improves accuracy on hard classes across datasets.

02

The method reduces performance disparity without sacrificing easy class accuracy.

03

Experiments on ImageNet and other datasets validate the effectiveness of MR$^2$.

Abstract

Deep neural networks often exhibit substantial disparities in class-wise accuracy, even when trained on class-balanced data, posing concerns for reliable deployment. While prior efforts have explored empirical remedies, a theoretical understanding of such performance disparities in classification remains limited. In this work, we present Margin Regularization for Performance Disparity Reduction (MR $^{2}$ ), a theoretically principled regularization for classification by dynamically adjusting margins in both the logit and representation spaces. Our analysis establishes a margin-based, class-sensitive generalization bound that reveals how per-class feature variability contributes to error, motivating the use of larger margins for hard classes. Guided by this insight, MR $^{2}$ optimizes per-class logit margins proportional to feature spread and penalizes excessive representation margins to…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

The paper is clearly written, and the ideas it explores are relevant to our broader understanding of optimization. The theoretical outline is well presented and easy to follow, and the experimental setup is generally sound and consistent with prior work. The proposed approach makes a meaningful contribution by improving performance on hard classes without hurting the easier ones, leading to a more balanced overall accuracy across classes.

Weaknesses

**W1)**: I believe this work, given its focus on margin geometry and embedding compactness, overlooks a closely related and highly relevant area known as Neural Collapse (Papyan et al., 2020). This phenomenon shows that in over-parameterized networks—such as those considered in this paper—class embeddings tend to collapse to a single prototype per class with maximal inter-class separation as training progresses. Subsequent works have analyzed Neural Collapse under class imbalance (Behnia et al.,

Reviewer 02Rating 4Confidence 4

Strengths

1. Paper introduces a theoretically motivated solution to the problem. 2. The paper is well structured with relevant experiments.

Weaknesses

1. Experiments are done on an older setup: I find that the experiments are done on older SOTA setups. The newer setups, like Sharpness Aware Minimization (SAM) [R1], WideResNets, have not been considered for comparison. Hence, the performance reported for datasets like CIFAR-10 and ImageNet is much lower than the current SoTA. Further, the margin-based algorithms like LDAM, compared with MR2, perform much better when compared to SAM [R2]. 2. Missing Comparison: There are some contrastive learni

Reviewer 03Rating 6Confidence 4

Strengths

* By providing background on the class disparity problem—an increasingly critical issue in modern classification settings—and analyzing its underlying causes, this study effectively establishes the motivation for addressing this problem * The study also supports the validity of the proposed approach with solid theoretical analysis, comprising precisely stated theorems and corresponding proofs * Furthermore, extensive experiments conducted on a wide range of datasets, including fine-grained ben

Weaknesses

**W1.** As illustrated in Eq. 13 (lines 286–291), there exists a trade-off between the first and second terms depending on the value of $\gamma$. Although this trade-off is bounded through Corollary 1, further tuning of the coefficient $\bar{c}$ is still required. This remains a tuning issue, in combination with another hyperparameter $\lambda$, which increases the overall burden of hyperparameter tuning. **W2.** The proposed approach indirectly verifies its effectiveness in addressing the clas

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning