Online Recommendations for Agents with Discounted Adaptive Preferences
Arpit Agarwal, William Brown

TL;DR
This paper studies a bandit recommendation problem where an agent's evolving preferences depend on past choices with discounting, proposing algorithms for sublinear regret in long-term and short-term memory regimes under various preference models.
Contribution
It introduces algorithms for adaptive recommendation with discounted memory, extending prior models to non-uniform memory and providing regret bounds for different preference classes.
Findings
Efficient sublinear regret achievable for smooth preferences in long-term memory regime.
Nearly the entire item space can be targeted with sublinear regret for scale-bounded preferences.
NP-hardness of extending regret guarantees beyond the EIRD set.
Abstract
We consider a bandit recommendations problem in which an agent's preferences (representing selection probabilities over recommended items) evolve as a function of past selections, according to an unknown . In each round, we show a menu of items (out of total) to the agent, who then chooses a single item, and we aim to minimize regret with respect to some (a subset of the item simplex) for adversarial losses over the agent's choices. Extending the setting from Agarwal and Brown (2022), where uniform-memory agents were considered, here we allow for non-uniform memory in which a discount factor is applied to the agent's memory vector at each subsequent round. In the "long-term memory" regime (when the effective memory horizon scales with sublinearly), we show that efficient sublinear regret is obtainable with respect to the set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Economic theories and models · Game Theory and Voting Systems
