On the optimality of the aggregate with exponential weights for low temperatures
Guillaume Lecu\'e, Shahar Mendelson

TL;DR
This paper investigates the aggregate with exponential weights (AEW) in regression, revealing its suboptimality at low temperatures unless a Bernstein condition is met, where it becomes optimal with a refined complexity measure.
Contribution
The paper characterizes the conditions under which AEW is optimal or suboptimal in low-temperature regimes, introducing a Bernstein condition for optimality.
Findings
AEW is suboptimal in expectation for low temperatures.
AEW may concentrate around suboptimal functions with high probability.
Under Bernstein condition, AEW achieves optimality in low-temperature regimes.
Abstract
Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by \[\tilde{f}^{\mathrm{AEW}}=\sum_{f\in F}\hat{\theta}(f)f,\qquad where \hat{\theta}(f)=\frac{\exp(-({n}/{T})R_n(f))}{\sum_{g\in F}\exp(-({n}/{T})R_n(g))},\] where is called the temperature parameter and is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
