Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines
Jingyi Jessica Li, Xin Tong

TL;DR
This paper clarifies the differences between hypothesis testing and binary classification in data analysis, providing guidelines to help practitioners choose the appropriate method for specific tasks, demonstrated through a cancer gene prediction example.
Contribution
It offers a clear comparison of hypothesis testing and classification, along with practical guidelines for selecting the suitable approach in various data analysis scenarios.
Findings
Distinctions in three key aspects between the two strategies
Five practical guidelines for choosing the appropriate method
Application example in cancer driver gene prediction
Abstract
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
