Distribution-free Deviation Bounds and The Role of Domain Knowledge in Learning via Model Selection with Cross-validation Risk Estimation
Diego Marcondes, Cl\'audia Peixoto

TL;DR
This paper establishes distribution-free deviation bounds for model selection with cross-validation risk estimation, emphasizing the importance of domain knowledge in choosing candidate models to enhance learning generalization.
Contribution
It introduces a systematic framework within statistical learning theory, providing deviation bounds based on VC dimension and formalizing Learning Spaces influenced by domain knowledge.
Findings
Distribution-free deviation bounds are derived for model selection.
Modeling candidate collections via domain knowledge can improve generalization.
The framework applies to both bounded and unbounded loss functions.
Abstract
Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. In particular, we investigate how the generalization of learning via model selection may be increased by modeling the collection of candidate models. We define the Learning Spaces as a class of candidate models in which the partial order by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
