Statistical Verification of Linear Classifiers

Anton Zhiyanov; Alexander Shklyaev; Alexey Galatenko; Vladimir; Galatenko; Alexander Tonevitsky

arXiv:2501.14430·stat.ML·January 27, 2025

Statistical Verification of Linear Classifiers

Anton Zhiyanov, Alexander Shklyaev, Alexey Galatenko, Vladimir, Galatenko, Alexander Tonevitsky

PDF

Open Access 1 Repo

TL;DR

This paper introduces a statistical homogeneity test to evaluate whether linear classifiers genuinely distinguish between classes, with a focus on gene expression data for breast cancer recurrence prediction.

Contribution

It develops an upper bound for the p-value of the test, especially effective for normally distributed samples, aiding in classifier validation.

Findings

01

The upper bound accurately estimates p-values for 2D normal samples.

02

The test confirms the significance of IGFBP6 and ELOVL5 genes in breast cancer recurrence.

03

Classifiers using gene pairs effectively differentiate recurrence status.

Abstract

We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely ``random'' or effectively captures differences between two classes. We focus on establishing upper bounds for the test's \emph{p}-value when applied to two-dimensional samples. Specifically, for normally distributed samples we experimentally demonstrate that the upper bound is highly accurate. Using this bound, we evaluate classifiers designed to detect ER-positive breast cancer recurrence based on gene pair expression. Our findings confirm significance of IGFBP6 and ELOVL5 genes in this process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiyanov/random-classifier
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models

MethodsFocus