Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Pascal Germain, Alexandre Lacasse, Fran\c{c}ois Laviolette, Mario, Marchand, Jean-Francis Roy

TL;DR
This paper introduces the C-bound for majority votes in binary classification, providing a PAC-Bayesian framework to estimate it from training data, and proposes the MinCq algorithm that minimizes this bound, achieving competitive results.
Contribution
It develops a new risk bound called the C-bound, offers a PAC-Bayesian analysis to estimate it, and introduces the MinCq learning algorithm based on minimizing this bound.
Findings
The C-bound effectively captures the quality and disagreement of voters.
MinCq achieves state-of-the-art performance compared to AdaBoost and SVM.
The analysis provides new PAC-Bayesian bounds, some without Kullback-Leibler divergence.
Abstract
We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
