A U-statistic estimator for the variance of resampling-based error   estimators

Mathias Fuchs; Roman Hornung; Riccardo De Bin; Anne-Laure; Boulesteix

arXiv:1310.8203·math.ST·December 19, 2013·1 cites

A U-statistic estimator for the variance of resampling-based error estimators

Mathias Fuchs, Roman Hornung, Riccardo De Bin, Anne-Laure, Boulesteix

PDF

Open Access

TL;DR

This paper introduces a U-statistic based estimator for the variance of resampling error estimators in binary classification, providing optimal properties and an asymptotically exact hypothesis test for comparing learning algorithms.

Contribution

It develops a new U-statistic estimator for the variance of resampling-based error estimators, improving accuracy and enabling precise hypothesis testing.

Findings

01

The estimator has minimal variance among unbiased estimators.

02

It is asymptotically normally distributed.

03

Provides an exact hypothesis test for error rate equality.

Abstract

We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Thus, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms