A new Hedging algorithm and its application to inferring latent random variables
Yoav Freund, Daniel Hsu

TL;DR
This paper introduces a novel online learning algorithm for cumulative discounted gain that relies on regret-based weighting instead of exponential weights, and explores its application in inferring latent variables.
Contribution
It proposes a new regret-based weighting scheme for online learning and demonstrates its use as an alternative to Bayesian methods for latent variable inference.
Findings
The algorithm effectively updates weights based on regret, ignoring experts with worse performance.
It offers a viable alternative to Bayesian averaging in latent variable inference.
The approach improves learning efficiency by focusing on better-performing experts.
Abstract
We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of the master algorithm receive zero weight. We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
