Lasso Bandit with Compatibility Condition on Optimal Arm
Harin Lee, Taehyun Hwang, Min-hwan Oh

TL;DR
This paper introduces a new Lasso bandit algorithm that achieves logarithmic regret in the ambient dimension without requiring diversity conditions, relying only on a weaker compatibility condition on the optimal arm.
Contribution
It demonstrates that the compatibility condition alone suffices for logarithmic regret bounds in sparse linear bandits, relaxing previous assumptions and proposing an adaptive algorithm.
Findings
Achieves $O( ext{poly}\log dT)$ regret under the margin condition.
Requires weaker assumptions than existing Lasso bandit algorithms.
Numerical experiments confirm superior performance.
Abstract
We consider a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has a sparse structure. In the existing Lasso bandit literature, the compatibility conditions, together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension . In this paper, we demonstrate that even without the additional diversity assumptions, the \textit{compatibility condition on the optimal arm} is sufficient to derive a regret bound that depends logarithmically on , and our assumption is strictly weaker than those used in the lasso bandit literature under the single-parameter setting. We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves…
Peer Reviews
Decision·ICLR 2025 Poster
By weakening the compatibility condition from all arms to only the optimal arm and relaxing all other diversification requirement in other high-dimensional bandits papers, this paper makes a solid contribution that broadens the applicability of high-dimensional bandit algorithms. Figure 1 and Table 1 clearly sketched the comparison against existing literature.
- While the paper criticizes other studies for unverifiable conditions, it does not clearly explain how the compatibility condition on the optimal arm (Assumption 3) is more verifiable in practice. This lack of clarity weakens the argument against existing methods. - The suggestion to treat the forced sampling iteration number $M_0$ as a tuning parameter could also be applied to other methods where assumptions are stronger and cannot be easily verified as well. This weakens the significance
- Soundness: In theoretical research, relaxing assumptions is undoubtedly meaningful. This work is particularly valuable in the sense that the authors united all various assumptions in sparse contextual linear bandits, and allows researchers to focus on a single general condition. - Clarity: The authors clearly illustrated the relationship between their results and previous work using diagrams, and, through an extensive literature review, they included comparisons in multi-parameter setups to h
- Their algorithm works on a more general 'environment', but to guarantee their performance the learning agent requires additional knowledge, such as the sparsity level $s_0$ or the gap $\Delta_0$. As they have mentioned, their result is not sparsity-agnostic. Even though they've mentioned in Appendix D, it is still true that they rely on the sparsity parameter $s_0$ and many other parameters, and I personally think this parameter is somewhat less general info than the noise level $\sigma$ or th
1. The authors provide a lot of intuitions for the algorithm design. 2. I think in general the weaker condition, and the algorithm induced, would be a good addition to the literature.
1. The paper needs to be restructured. If the key idea of this paper is the weaker condition proposed, then the main section and main results should be about this condition. Therefore, Appendix B should be the main part of the paper instead of just in the appendix. Besides, the novelty claimed, such as the cyclic induction (Lemma 6?), should be moved to the main text as well. It's hard to read and understand the novelty for the current version. 2. Assumption 3 is claimed to be "strictly weaker"
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research
