Oracle-efficient Hybrid Learning with Constrained Adversaries

Princewill Okoroafor; Robert Kleinberg; Michael P. Kim

arXiv:2603.04546·cs.LG·March 6, 2026

Oracle-efficient Hybrid Learning with Constrained Adversaries

Princewill Okoroafor, Robert Kleinberg, Michael P. Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces an oracle-efficient hybrid learning algorithm that achieves near-optimal regret in a setting with structured adversaries, bridging the gap between statistical optimality and computational efficiency.

Contribution

It presents a new structured setting with constrained adversaries and develops an efficient algorithm that attains regret bounds based on Rademacher complexity, advancing hybrid online learning theory.

Findings

01

Achieves regret scaling with Rademacher complexity of derived class

02

Provides an oracle-efficient algorithm for high-dimensional zero-sum games

03

Develops novel tools including a Frank-Wolfe reduction and tail bounds for hybrid martingales

Abstract

The Hybrid Online Learning Problem, where features are drawn i.i.d. from an unknown distribution but labels are generated adversarially, is a well-motivated setting positioned between statistical and fully-adversarial online learning. Prior work has presented a dichotomy: algorithms that are statistically-optimal, but computationally intractable (Wu et al., 2023), and algorithms that are computationally-efficient (given an ERM oracle), but statistically-suboptimal (Wu et al., 2024). This paper takes a significant step towards achieving statistical optimality and computational efficiency simultaneously in the Hybrid Learning setting. To do so, we consider a structured setting, where the Adversary is constrained to pick labels from an expressive, but fixed, class of functions $R$ . Our main result is a new learning algorithm, which runs efficiently given an ERM oracle and obtains regret…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

The problem considered in the paper (hybrid online learning) is a very interesting abstract of "beyond worst case" sequential decision making and appears (at least conceptually) as intermediate step in several sequential decision making and game theoretic problems. Further, the study of oracle efficiency is well motivated in these applications ("best response" for example) which further justifies the interestingness of the problem. Given that, the paper achieving the strong regret guarantees in

Weaknesses

The main issue with the regret bound presented is the dependence on the nonstandard quantity, the rademacher/VC dimension of the composed loss class. This complexity makes it a bit hard to compare the result with previous works directly and (if I understand correctly) should be treated as incomparable (and not an improvement on) to all previous work in the area (see below).

Reviewer 02Rating 4Confidence 2

Strengths

- The considered problem is well-motivated and has real-world applications. - The designed algorithms are intuitive and easy to implement. The algorithm is also computationally efficient compared to previous works.

Weaknesses

- One concern is about the construction of FTRL. Specifically, I am not sure the role of the entropy regularizer is under-motivated. In the main text, the “truncated” entropy $v\mapsto \sum_s v(s)\log(v(s)+1)$ is chosen because $\log(1+a)$ is uniformly strongly convex on $[0,1]$, giving strong convexity over the first $(t-1)$ coordinates at step $t$. But the paper does not explain why one could not use a simpler $\ell_2$ regularizer or mirror maps that may yield cleaner constants or better bound

Reviewer 03Rating 8Confidence 4

Strengths

1. I find the idea of reducing the hybrid online learning problem to an OCO problem quite interesting. This could inspire researchers to explore similar reductions for more complex hybrid settings, such as the smoothed adversary model. 2. As far as I understand, the truncated entropy regularizer is novel and may have broader application scenarios beyond this particular setting. 3. The authors demonstrated the usefulness of their constrained labeling-function formulation in the context of findi

Weaknesses

1. Although the paper provides a use case for the constrained labeling-function setting in games, it still feels somewhat restrictive, especially since the prior result by Wu et al. (2024) does not rely on such constraints. It would strengthen the paper if the authors could present additional examples where similar constraints arise naturally from structural properties of the problem. 2. The paper claims that the obtained regret bound is “near-optimal.” I am not entirely sure how this should be

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques