Deep Exploration for Recommendation Systems
Zheqing Zhu, Benjamin Van Roy

TL;DR
This paper introduces deep exploration methods for recommendation systems, framing them as sequential decision problems to effectively learn from delayed and sparse feedback, leading to significant performance improvements.
Contribution
It formulates recommendation as a sequential decision process and develops deep exploration techniques to better learn from delayed feedback, especially in sparse reward scenarios.
Findings
Deep exploration outperforms single-step exploration.
Significant improvements demonstrated in industrial-grade simulators.
Effective learning from delayed and sparse feedback achieved.
Abstract
Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Smart Grid Energy Management
