Nonlinear Monte Carlo Method for Imbalanced Data Learning

Xuli Shen; Qing Xu; Xiangyang Xue

arXiv:2010.14060·cs.LG·December 1, 2022

Nonlinear Monte Carlo Method for Imbalanced Data Learning

Xuli Shen, Qing Xu, Xiangyang Xue

PDF

Open Access

TL;DR

This paper introduces a nonlinear Monte Carlo method that improves learning on imbalanced data by replacing the mean loss with the maximum subgroup loss, enhancing robustness and reducing training steps.

Contribution

It proposes a novel nonlinear Monte Carlo approach based on nonlinear expectation theory to better handle imbalanced data in machine learning.

Findings

01

Outperforms state-of-the-art models on imbalanced classification tasks.

02

Requires fewer training steps for convergence.

03

Provides increased robustness in regression and classification.

Abstract

For basic machine learning problems, expected error is used to evaluate model performance. Since the distribution of data is usually unknown, we can make simple hypothesis that the data are sampled independently and identically distributed (i.i.d.) and the mean value of loss function is used as the empirical risk by Law of Large Numbers (LLN). This is known as the Monte Carlo method. However, when LLN is not applicable, such as imbalanced data problems, empirical risk will cause overfitting and might decrease robustness and generalization ability. Inspired by the framework of nonlinear expectation theory, we substitute the mean value of loss function with the maximum value of subgroup mean loss. We call it nonlinear Monte Carlo method. In order to use numerical method of optimization, we linearize and smooth the functional of maximum empirical risk and get the descent direction via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Advanced Statistical Process Monitoring