Online Decision Deferral under Budget Constraints

Mirabel Reid; Tom S\"uhr; Claire Vernade; Samira Samadi

arXiv:2409.20489·cs.LG·October 1, 2024

Online Decision Deferral under Budget Constraints

Mirabel Reid, Tom S\"uhr, Claire Vernade, Samira Samadi

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces an adaptive online decision deferral framework using a contextual bandit model that manages budget constraints and partial feedback, improving decision-making efficiency in dynamic, real-world scenarios.

Contribution

It presents a novel contextual bandit approach for online decision deferral with budget constraints, including theoretical guarantees and practical extensions.

Findings

01

Algorithm achieves strong theoretical performance guarantees.

02

Extensions demonstrate high effectiveness on real-world datasets.

03

Framework adapts to changing task distributions and feedback types.

Abstract

Machine Learning (ML) models are increasingly used to support or substitute decision making. In applications where skilled experts are a limited resource, it is crucial to reduce their burden and automate decisions when the performance of an ML model is at least of equal quality. However, models are often pre-trained and fixed, while tasks arrive sequentially and their distribution may shift. In that case, the respective performance of the decision makers may change, and the deferral algorithm must remain adaptive. We propose a contextual bandit model of this online decision making problem. Our framework includes budget constraints and different types of partial feedback models. Beyond the theoretical guarantees of our algorithm, we propose efficient extensions that achieve remarkable performance on real-world datasets.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

The topic of the manuscript originates from a practical problem.

Weaknesses

The original contribution of the paper is hard to justify as it is largely based on the work of Agrawal and Devanur (2016).

Reviewer 02Rating 6Confidence 4

Strengths

1. The problem formulation and the algorithm design seem to be novel and of practical interest. 2. Theoretical regret guarantees are provided for the proposed algorithms.

Weaknesses

1. Considering optimal static policy seems to be limited as it can be far from the true dynamic optimal policy. Would it be possible or how difficult it is to extend the current regret analysis in the paper to handle dynamic regret which uses the dynamic optimal policy as benchmark? 2. The regret analysis seems to be straightforward extensions from existing works on UCB-based algorithms for bandit problems as the authors mentioned in Section~4. 3. There is no regret guarantee for the neural line

Reviewer 03Rating 3Confidence 4

Strengths

None

Weaknesses

The paper has limited novelty and contribution. 1. The paper simply uses the an existing framework (Bandits with Knapsacks [Badanidiyuru et al., 2018], [Agrawal and Devanur, 2016]) to choose between a model's prediction or to defer to a skilled expert. 2. Limited contribution: - The regret guarantee provided is a straightforward combination of the linear contextual bandit with knapsack guarantee [Agrawal and Devanur, 2016] with the generalized linear bandit analysis from of [Li et al. (2017)],

Code & Models

Repositories

tsuehr/OnlineLearningToDeferWithKnapsack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Blockchain Technology Applications and Security · Information and Cyber Security