Statistical hypothesis testing versus machine-learning binary   classification: distinctions and guidelines

Jingyi Jessica Li; Xin Tong

arXiv:2007.01935·stat.AP·December 1, 2021

Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines

Jingyi Jessica Li, Xin Tong

PDF

TL;DR

This paper clarifies the differences between hypothesis testing and binary classification in data analysis, providing guidelines to help practitioners choose the appropriate method for specific tasks, demonstrated through a cancer gene prediction example.

Contribution

It offers a clear comparison of hypothesis testing and classification, along with practical guidelines for selecting the suitable approach in various data analysis scenarios.

Findings

01

Distinctions in three key aspects between the two strategies

02

Five practical guidelines for choosing the appropriate method

03

Application example in cancer driver gene prediction

Abstract

Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.