Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support
Hung Tran-The, Sunil Gupta, Thanh Nguyen-Tang, Santu Rana, Svetha, Venkatesh

TL;DR
This paper proposes a hybrid offline-online learning method for contextual bandits that effectively handles support deficiency in logged data, providing theoretical guarantees and empirical validation.
Contribution
It introduces a novel hybrid approach combining offline learning with online exploration to address support deficiency in contextual bandit policy learning.
Findings
The method achieves near-optimal policies with minimal online exploration.
Empirical results demonstrate improved performance over traditional offline methods.
The approach provides theoretical guarantees on policy optimality.
Abstract
We address policy learning with logged data in contextual bandits. Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring the logging policy to have \emph{full support} i.e. a non-zero probability for any context/action of the evaluation policy. However, many real-world systems do not guarantee such logging policies, especially when the action space is large and many actions have poor or missing rewards. With such \emph{support deficiency}, the offline learning fails to find optimal policies. We propose a novel approach that uses a hybrid of offline learning with online exploration. The online exploration is used to explore unsupported actions in the logged data whilst offline learning is used to exploit supported actions from the logged data avoiding unnecessary explorations. Our approach determines an optimal policy with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
