Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback

Mohammadsina Almasi; Hadis Anahideh

arXiv:2511.10572·cs.AI·November 17, 2025

Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback

Mohammadsina Almasi, Hadis Anahideh

PDF

Open Access 1 Video

TL;DR

This paper introduces a bi-level contextual bandit framework for personalized resource allocation that accounts for delayed feedback, heterogeneity, and fairness, improving decision-making in high-stakes domains.

Contribution

It presents a novel bi-level model combining subgroup-level budget optimization with individual responsiveness detection, explicitly modeling delays and dynamics in real-world settings.

Findings

01

Outperforms existing methods in cumulative outcomes.

02

Adapts effectively to delay structures and feedback delays.

03

Ensures equitable distribution across subgroups.

Abstract

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and ethical constraints. However, most learning-based allocation frameworks either assume immediate feedback or ignore the complex interplay between individual characteristics and intervention dynamics. We propose a novel bi-level contextual bandit framework for individualized resource allocation under delayed feedback, designed to operate in real-world settings with dynamic populations, capacity constraints, and time-sensitive impact. At the meta level, the model optimizes subgroup-level budget allocations to satisfy fairness and operational constraints. At the base level, it identifies the most responsive individuals within each group using a neural network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bi-Level Contextual Bandits for Individualized Resource Allocation Under Delayed Feedback· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Reinforcement Learning in Robotics