Covariance-adapting algorithm for semi-bandits with application to sparse rewards
Pierre Perrault, Vianney Perchet, Michal Valko

TL;DR
This paper introduces a covariance-adapting algorithm for semi-bandits with sparse rewards, providing tighter regret bounds and practical applicability to recommender systems.
Contribution
It develops a new algorithm that estimates covariance matrices for semi-bandits, offering improved regret bounds and extending to sparse outcome scenarios.
Findings
Proved a new lower bound on regret involving the outcome covariance matrix.
Designed an algorithm that adapts to covariance estimates for better performance.
Extended results to sparse outcomes with applications in recommender systems.
Abstract
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
