Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance
Hassan Abdelraouf, Jeff S. Shamma

TL;DR
This paper investigates whether some no-regret learning algorithms can outperform others across all environments, revealing that anticipatory replicator dynamics can globally dominate standard no-regret algorithms, thus challenging the notion of a 'free-lunch' in regret minimization.
Contribution
The paper introduces a control-theoretic framework to compare no-regret algorithms, demonstrating that anticipatory replicator dynamics can globally outperform standard no-regret methods.
Findings
Anticipatory RD globally dominates standard RD in all payoff environments.
A passivity-based approach is used for performance comparison.
Optimal control formulation shows zero minimal reward gap.
Abstract
No-regret learning dynamics ensure that a learner asymptotically achieves an average reward no worse than that of any fixed strategy. This no-regret guarantee does not determine the value of the asymptotic average reward. Indeed, it is possible for different no-regret learning dynamics to exhibit different asymptotic average rewards when facing the same environment while both assure the no-regret guarantee. This paper asks whether a "free-lunch" phenomenon can arise among no-regret algorithms. Namely, is it possible for one no-regret learning rule to uniformly outperform another no-regret learning rule across all payoff environments. Stated differently, can a learner regret not using a particular no-regret algorithm? We consider generalized replicator dynamics (RD) as a cascade interconnection between a linear time-invariant (LTI) system and the softmax nonlinearity. Varying this LTI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications
