Ratio law: mathematical descriptions for a universal relationship   between AI performance and input samples

Boming Kang; Qinghua Cui

arXiv:2411.00913·cs.LG·November 5, 2024

Ratio law: mathematical descriptions for a universal relationship between AI performance and input samples

Boming Kang, Qinghua Cui

PDF

Open Access

TL;DR

This paper uncovers a ratio law linking AI performance to data sample ratios, mathematically proves optimal performance on balanced datasets, and demonstrates how this insight can improve model accuracy across various tasks.

Contribution

It introduces a novel ratio law with concise equations connecting AI performance and data imbalance, and shows how these equations can guide performance enhancement strategies.

Findings

01

Performance improves with balanced datasets

02

Equations accurately predict performance based on data ratios

03

Ensemble strategies guided by equations outperform traditional methods

Abstract

Artificial intelligence based on machine learning and deep learning has made significant advances in various fields such as protein structure prediction and climate modeling. However, a central challenge remains: the "black box" nature of AI, where precise quantitative relationships between inputs and outputs are often lacking. Here, by analyzing 323 AI models trained to predict human essential proteins, we uncovered a ratio law showing that model performance and the ratio of minority to majority samples can be closely linked by two concise equations. Moreover, we mathematically proved that an AI model achieves its optimal performance on a balanced dataset. More importantly, we next explore whether this finding can further guide us to enhance AI models' performance. Therefore, we divided the imbalanced dataset into several balanced subsets to train base classifiers, and then applied a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsBalanced Selection