Cross-validation improved by aggregation: Agghoo

Guillaume Maillard (LMO; SELECT; LM-Orsay); Sylvain Arlot (LMO,; SELECT; LM-Orsay); Matthieu Lerasle (LMO; SELECT; LM-Orsay)

arXiv:1709.03702·math.ST·September 13, 2017

Cross-validation improved by aggregation: Agghoo

Guillaume Maillard (LMO, SELECT, LM-Orsay), Sylvain Arlot (LMO,, SELECT, LM-Orsay), Matthieu Lerasle (LMO, SELECT, LM-Orsay)

PDF

Open Access

TL;DR

Agghoo, an aggregation-based method combining cross-validation and bagging, significantly improves prediction accuracy with theoretical guarantees, making it a promising general-purpose tool for supervised classification tasks.

Contribution

This paper introduces Agghoo, a novel aggregation method that enhances cross-validation, supported by theoretical guarantees and improved prediction performance.

Findings

01

Agghoo can outperform traditional cross-validation in prediction error.

02

Theoretical guarantees show Agghoo performs at least as well as hold-out.

03

Agghoo achieves minimax rates under the margin condition in binary classification.

Abstract

Cross-validation is widely used for selecting among a family of learning rules. This paper studies a related method, called aggregated hold-out (Agghoo), which mixes cross-validation with aggregation; Agghoo can also be related to bagging. According to numerical experiments, Agghoo can improve significantly cross-validation's prediction error, at the same computational cost; this makes it very promising as a general-purpose tool for prediction. We provide the first theoretical guarantees on Agghoo, in the supervised classification setting, ensuring that one can use it safely: at worse, Agghoo performs like the hold-out, up to a constant factor. We also prove a non-asymptotic oracle inequality, in binary classification under the margin condition, which is sharp enough to get (fast) minimax rates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms