Statistical Verification of Linear Classifiers
Anton Zhiyanov, Alexander Shklyaev, Alexey Galatenko, Vladimir, Galatenko, Alexander Tonevitsky

TL;DR
This paper introduces a statistical homogeneity test to evaluate whether linear classifiers genuinely distinguish between classes, with a focus on gene expression data for breast cancer recurrence prediction.
Contribution
It develops an upper bound for the p-value of the test, especially effective for normally distributed samples, aiding in classifier validation.
Findings
The upper bound accurately estimates p-values for 2D normal samples.
The test confirms the significance of IGFBP6 and ELOVL5 genes in breast cancer recurrence.
Classifiers using gene pairs effectively differentiate recurrence status.
Abstract
We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely ``random'' or effectively captures differences between two classes. We focus on establishing upper bounds for the test's \emph{p}-value when applied to two-dimensional samples. Specifically, for normally distributed samples we experimentally demonstrate that the upper bound is highly accurate. Using this bound, we evaluate classifiers designed to detect ER-positive breast cancer recurrence based on gene pair expression. Our findings confirm significance of IGFBP6 and ELOVL5 genes in this process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
MethodsFocus
