Ratio law: mathematical descriptions for a universal relationship between AI performance and input samples
Boming Kang, Qinghua Cui

TL;DR
This paper uncovers a ratio law linking AI performance to data sample ratios, mathematically proves optimal performance on balanced datasets, and demonstrates how this insight can improve model accuracy across various tasks.
Contribution
It introduces a novel ratio law with concise equations connecting AI performance and data imbalance, and shows how these equations can guide performance enhancement strategies.
Findings
Performance improves with balanced datasets
Equations accurately predict performance based on data ratios
Ensemble strategies guided by equations outperform traditional methods
Abstract
Artificial intelligence based on machine learning and deep learning has made significant advances in various fields such as protein structure prediction and climate modeling. However, a central challenge remains: the "black box" nature of AI, where precise quantitative relationships between inputs and outputs are often lacking. Here, by analyzing 323 AI models trained to predict human essential proteins, we uncovered a ratio law showing that model performance and the ratio of minority to majority samples can be closely linked by two concise equations. Moreover, we mathematically proved that an AI model achieves its optimal performance on a balanced dataset. More importantly, we next explore whether this finding can further guide us to enhance AI models' performance. Therefore, we divided the imbalanced dataset into several balanced subsets to train base classifiers, and then applied a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsBalanced Selection
