Oracle-efficient Hybrid Learning with Constrained Adversaries
Princewill Okoroafor, Robert Kleinberg, Michael P. Kim

TL;DR
This paper introduces an oracle-efficient hybrid learning algorithm that achieves near-optimal regret in a setting with structured adversaries, bridging the gap between statistical optimality and computational efficiency.
Contribution
It presents a new structured setting with constrained adversaries and develops an efficient algorithm that attains regret bounds based on Rademacher complexity, advancing hybrid online learning theory.
Findings
Achieves regret scaling with Rademacher complexity of derived class
Provides an oracle-efficient algorithm for high-dimensional zero-sum games
Develops novel tools including a Frank-Wolfe reduction and tail bounds for hybrid martingales
Abstract
The Hybrid Online Learning Problem, where features are drawn i.i.d. from an unknown distribution but labels are generated adversarially, is a well-motivated setting positioned between statistical and fully-adversarial online learning. Prior work has presented a dichotomy: algorithms that are statistically-optimal, but computationally intractable (Wu et al., 2023), and algorithms that are computationally-efficient (given an ERM oracle), but statistically-suboptimal (Wu et al., 2024). This paper takes a significant step towards achieving statistical optimality and computational efficiency simultaneously in the Hybrid Learning setting. To do so, we consider a structured setting, where the Adversary is constrained to pick labels from an expressive, but fixed, class of functions . Our main result is a new learning algorithm, which runs efficiently given an ERM oracle and obtains regret…
Peer Reviews
Decision·ICLR 2026 Poster
The problem considered in the paper (hybrid online learning) is a very interesting abstract of "beyond worst case" sequential decision making and appears (at least conceptually) as intermediate step in several sequential decision making and game theoretic problems. Further, the study of oracle efficiency is well motivated in these applications ("best response" for example) which further justifies the interestingness of the problem. Given that, the paper achieving the strong regret guarantees in
The main issue with the regret bound presented is the dependence on the nonstandard quantity, the rademacher/VC dimension of the composed loss class. This complexity makes it a bit hard to compare the result with previous works directly and (if I understand correctly) should be treated as incomparable (and not an improvement on) to all previous work in the area (see below).
- The considered problem is well-motivated and has real-world applications. - The designed algorithms are intuitive and easy to implement. The algorithm is also computationally efficient compared to previous works.
- One concern is about the construction of FTRL. Specifically, I am not sure the role of the entropy regularizer is under-motivated. In the main text, the “truncated” entropy $v\mapsto \sum_s v(s)\log(v(s)+1)$ is chosen because $\log(1+a)$ is uniformly strongly convex on $[0,1]$, giving strong convexity over the first $(t-1)$ coordinates at step $t$. But the paper does not explain why one could not use a simpler $\ell_2$ regularizer or mirror maps that may yield cleaner constants or better bound
1. I find the idea of reducing the hybrid online learning problem to an OCO problem quite interesting. This could inspire researchers to explore similar reductions for more complex hybrid settings, such as the smoothed adversary model. 2. As far as I understand, the truncated entropy regularizer is novel and may have broader application scenarios beyond this particular setting. 3. The authors demonstrated the usefulness of their constrained labeling-function formulation in the context of findi
1. Although the paper provides a use case for the constrained labeling-function setting in games, it still feels somewhat restrictive, especially since the prior result by Wu et al. (2024) does not rely on such constraints. It would strengthen the paper if the authors could present additional examples where similar constraints arise naturally from structural properties of the problem. 2. The paper claims that the obtained regret bound is “near-optimal.” I am not entirely sure how this should be
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
