Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure
Puja Sahu, Nandyala Hemachandra

TL;DR
This paper compares chi-squared divergence based PAC-Bayesian bounds with KL-divergence based bounds, deriving optimal posteriors, analyzing their properties, and evaluating their performance on classifiers, highlighting differences in bounds, test errors, and computational efficiency.
Contribution
It introduces methods to compute optimal posteriors for chi-squared divergence bounds, compares them with KL-divergence posteriors, and assesses their practical performance and computational aspects.
Findings
Chi-squared divergence posteriors have weaker bounds and worse test errors.
KL-divergence based posteriors are more effective in test error performance.
Proposed fixed point equations enable fast computation of optimal posteriors.
Abstract
We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
MethodsSupport Vector Machine
