Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback
Mohammadsina Almasi, Hadis Anahideh

TL;DR
This paper introduces a bi-level contextual bandit framework for personalized resource allocation that accounts for delayed feedback, heterogeneity, and fairness, improving decision-making in high-stakes domains.
Contribution
It presents a novel bi-level model combining subgroup-level budget optimization with individual responsiveness detection, explicitly modeling delays and dynamics in real-world settings.
Findings
Outperforms existing methods in cumulative outcomes.
Adapts effectively to delay structures and feedback delays.
Ensures equitable distribution across subgroups.
Abstract
Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and ethical constraints. However, most learning-based allocation frameworks either assume immediate feedback or ignore the complex interplay between individual characteristics and intervention dynamics. We propose a novel bi-level contextual bandit framework for individualized resource allocation under delayed feedback, designed to operate in real-world settings with dynamic populations, capacity constraints, and time-sensitive impact. At the meta level, the model optimizes subgroup-level budget allocations to satisfy fairness and operational constraints. At the base level, it identifies the most responsive individuals within each group using a neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Reinforcement Learning in Robotics
