Learning with Exposure Constraints in Recommendation Systems
Omer Ben-Porat, Rotem Torkan

TL;DR
This paper models recommendation systems as a multi-armed bandit problem with exposure constraints, ensuring content providers remain viable while maximizing user welfare, and develops optimal algorithms with sub-linear regret.
Contribution
It introduces a novel bandit framework incorporating minimum exposure constraints for content providers and provides algorithms with provably optimal regret bounds.
Findings
Algorithms achieve sub-linear regret in the proposed model.
Lower bounds show the algorithms are near-optimal.
The model effectively balances provider viability and user welfare.
Abstract
Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Recommender Systems and Techniques
