Budgeted Recommendation with Delayed Feedback
Kweiguu Liu, Setareh Maghsudi

TL;DR
This paper addresses the challenge of resource-constrained decision-making in contextual bandits with delayed feedback, proposing a new policy to optimize resource use despite delays and limited budgets.
Contribution
It introduces DORAL, a novel policy designed to handle delayed feedback in constrained contextual bandits, improving resource allocation under such conditions.
Findings
DORAL effectively manages delayed feedback in resource-limited settings.
The policy improves decision accuracy despite feedback delays.
Application to COVID-19 resource distribution demonstrates practical benefits.
Abstract
In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing · Forecasting Techniques and Applications · Advanced Bandit Algorithms Research
