Enhancing Data Quality through Self-learning on Imbalanced Financial   Risk Data

Xu Sun; Zixuan Qin; Shun Zhang; Yuexian Wang; Li Huang

arXiv:2409.09792·cs.LG·September 17, 2024

Enhancing Data Quality through Self-learning on Imbalanced Financial Risk Data

Xu Sun, Zixuan Qin, Shun Zhang, Yuexian Wang, Li Huang

PDF

Open Access

TL;DR

This paper introduces TriEnhance, a novel data pre-processing method that improves financial risk prediction models by generating synthetic minority class samples, filtering, and self-learning, leading to better minority class calibration.

Contribution

The paper presents TriEnhance, a simple yet effective technique for augmenting imbalanced financial risk datasets through synthetic sample generation, filtering, and pseudo-labeling.

Findings

01

TriEnhance improves minority class calibration across six benchmark datasets.

02

The method enhances the detection of high-risk instances in financial datasets.

03

Results show significant performance gains over baseline models.

Abstract

In the financial risk domain, particularly in credit default prediction and fraud detection, accurate identification of high-risk class instances is paramount, as their occurrence can have significant economic implications. Although machine learning models have gained widespread adoption for risk prediction, their performance is often hindered by the scarcity and diversity of high-quality data. This limitation stems from factors in datasets such as small risk sample sizes, high labeling costs, and severe class imbalance, which impede the models' ability to learn effectively and accurately forecast critical events. This study investigates data pre-processing techniques to enhance existing financial risk datasets by introducing TriEnhance, a straightforward technique that entails: (1) generating synthetic samples specifically tailored to the minority class, (2) filtering using binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Distress and Bankruptcy Prediction · Imbalanced Data Classification Techniques

MethodsFocus · Self-Learning