Enhancing Data Quality through Self-learning on Imbalanced Financial Risk Data
Xu Sun, Zixuan Qin, Shun Zhang, Yuexian Wang, Li Huang

TL;DR
This paper introduces TriEnhance, a novel data pre-processing method that improves financial risk prediction models by generating synthetic minority class samples, filtering, and self-learning, leading to better minority class calibration.
Contribution
The paper presents TriEnhance, a simple yet effective technique for augmenting imbalanced financial risk datasets through synthetic sample generation, filtering, and pseudo-labeling.
Findings
TriEnhance improves minority class calibration across six benchmark datasets.
The method enhances the detection of high-risk instances in financial datasets.
Results show significant performance gains over baseline models.
Abstract
In the financial risk domain, particularly in credit default prediction and fraud detection, accurate identification of high-risk class instances is paramount, as their occurrence can have significant economic implications. Although machine learning models have gained widespread adoption for risk prediction, their performance is often hindered by the scarcity and diversity of high-quality data. This limitation stems from factors in datasets such as small risk sample sizes, high labeling costs, and severe class imbalance, which impede the models' ability to learn effectively and accurately forecast critical events. This study investigates data pre-processing techniques to enhance existing financial risk datasets by introducing TriEnhance, a straightforward technique that entails: (1) generating synthetic samples specifically tailored to the minority class, (2) filtering using binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Imbalanced Data Classification Techniques
MethodsFocus · Self-Learning
