Linear Bandits in High Dimension and Recommendation Systems
Yash Deshpande, Andrea Montanari

TL;DR
This paper models recommendation systems as high-dimensional linear bandits, proposing policies with proven bounds and validating them on real datasets, addressing exploration-exploitation trade-offs in data-scarce environments.
Contribution
It introduces a new policy for high-dimensional linear bandits with theoretical bounds and adapts it for data-rich scenarios, validated on real recommendation datasets.
Findings
Proven upper and lower bounds on cumulative reward in high-dimensional regimes.
Modified policy achieves near-optimal risk in data-rich settings.
Experimental validation on Netflix and MovieLens datasets shows good agreement with theory.
Abstract
A large number of online services provide automated recommendations to help users to navigate through a large collection of items. New items (products, videos, songs, advertisements) are suggested on the basis of the user's past history and --when available-- her demographic profile. Recommendations have to satisfy the dual goal of helping the user to explore the space of available items, while allowing the system to probe the user's preferences. We model this trade-off using linearly parametrized multi-armed bandits, propose a policy and prove upper and lower bounds on the cumulative "reward" that coincide up to constants in the data poor (high-dimensional) regime. Prior work on linear bandits has focused on the data rich (low-dimensional) regime and used cumulative "risk" as the figure of merit. For this data rich regime, we provide a simple modification for our policy that achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
