Diversified Recommendations for Agents with Adaptive Preferences
Arpit Agarwal, William Brown

TL;DR
This paper introduces a framework for diversified content recommendations using adversarial bandits, balancing reward optimization with content variety, and provides algorithms with provable regret bounds for locally learnable preference models.
Contribution
It formalizes the diversified recommendation problem as an adversarial bandit task and develops algorithms with regret guarantees for locally learnable preference models.
Findings
Algorithm achieves O(T^{3/4}) regret for diversified recommendations.
High-entropy distributions are shown to be realizable at any history.
Negative results justify the assumptions and limitations of the approach.
Abstract
When an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on fixed preferences, but also on their prior engagements with the platform. The Recommender's primary objective is typically to encourage content consumption which optimizes some reward, such as ad revenue, but they often also aim to ensure that a wide variety of content is consumed by the Agent over time. We formalize this problem as an adversarial bandit task. At each step, the Recommender presents a menu of (out of ) items to the Agent, who selects one item in the menu according to their unknown preference model, which maps their history of past items to relative selection probabilities. The Recommender then observes the Agent's chosen item and receives bandit feedback of the item's reward. In addition to optimizing reward from selected items, the Recommender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Misinformation and Its Impacts
