A Link between Coding Theory and Cross-Validation with Applications
Tapio Pahikkala, Parisa Movahedi, Ileana Montoya, Havu Miikonen,, Stephan Foldes, Antti Airola, Laszlo Major

TL;DR
This paper establishes a novel connection between coding theory and cross-validation in machine learning, providing exact bounds and new methods for evaluating classifiers using error detecting codes.
Contribution
It introduces the concept of light constant weight codes to analyze cross-validation errors and develops new LPOCV-based statistical tests for learning algorithms.
Findings
Maximal number of zero-error classification problems equals the number of code words in a constant weight code.
Generalization of CWCs to light CWCs for nonzero errors.
New LPOCV-based randomization tests for classifiers.
Abstract
How many different binary classification problems a single learning algorithm can solve on a fixed data with exactly zero or at most a given number of cross-validation errors? While the number in the former case is known to be limited by the no-free-lunch theorem, we show that the exact answers are given by the theory of error detecting codes. As a case study, we focus on the AUC performance measure and leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out at a time. We show that the maximal number of classification problems with fixed class proportion, for which a learning algorithm can achieve zero LPOCV error, equals the maximal number of code words in a constant weight code (CWC), with certain technical properties. We then generalize CWCs by introducing light CWCs, and prove an analogous result for nonzero LPOCV errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Neural Networks and Applications · Algorithms and Data Compression
