Practical machine learning is learning on small samples

Marina Sapir

arXiv:2501.01836·cs.LG·January 6, 2025

Practical machine learning is learning on small samples

Marina Sapir

PDF

Open Access

TL;DR

This paper argues that practical machine learning relies on the assumption of smooth underlying dependencies, enabling effective learning from small samples by selecting hypotheses that smoothly approximate data.

Contribution

It introduces the Practical learning paradigm, formalizes its concepts, and shows that popular learners are implementations of this paradigm.

Findings

01

Popular learners are implementations of the Practical learning paradigm.

02

Smoothness assumption underpins effective learning from small samples.

03

Formalization of the paradigm guides the design of practical machine learning methods.

Abstract

Based on limited observations, machine learning discerns a dependence which is expected to hold in the future. What makes it possible? Statistical learning theory imagines indefinitely increasing training sample to justify its approach. In reality, there is no infinite time or even infinite general population for learning. Here I argue that practical machine learning is based on an implicit assumption that underlying dependence is relatively ``smooth" : likely, there are no abrupt differences in feedback between cases with close data points. From this point of view learning shall involve selection of the hypothesis ``smoothly" approximating the training set. I formalize this as Practical learning paradigm. The paradigm includes terminology and rules for description of learners. Popular learners (local smoothing, k-NN, decision trees, Naive Bayes, SVM for classification and for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

Methodsk-Nearest Neighbors · Support Vector Machine