Latency-Aware Contextual Bandit: Application to Cryo-EM Data Collection

Lai Wei; Ambuj Tewari; Michael A. Cianfrocco

arXiv:2410.13109·stat.ML·October 10, 2025

Latency-Aware Contextual Bandit: Application to Cryo-EM Data Collection

Lai Wei, Ambuj Tewari, Michael A. Cianfrocco

PDF

Open Access 3 Reviews

TL;DR

This paper presents a latency-aware contextual bandit framework tailored for applications like cryo-EM data collection, optimizing decision-making under delays to maximize cumulative reward.

Contribution

It introduces a novel framework and the COAF algorithm that incorporate action delays into contextual bandit problems, with theoretical regret bounds and practical validation.

Findings

01

The COAF algorithm effectively balances exploration and exploitation considering delays.

02

The approach achieves regret bounds comparable to standard contextual bandits.

03

Numerical experiments show improved reward maximization in cryo-EM data collection.

Abstract

We introduce a latency-aware contextual bandit framework that generalizes the standard contextual bandit problem, where the learner adaptively selects arms and switches decision sets under action delays. In this setting, the learner observes the context and may select multiple arms from a decision set, with the total time determined by the selected subset. The problem can be framed as a special case of semi-Markov decision processes (SMDPs), where contexts and latencies are drawn from an unknown distribution. Leveraging the Bellman optimality equation, we design the contextual online arm filtering (COAF) algorithm, which balances exploration, exploitation, and action latency to minimize regret relative to the optimal average-reward policy. We analyze the algorithm and show that its regret upper bounds match established results in the contextual bandit literature. In numerical…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1) The latency-aware formulation is conceptually relevant to real scientific workflows. 2) The mathematical derivations are careful and correct. 3) The paper is generally well written and easy to follow. 4) The cryo-EM example adds color and a nice application context.

Weaknesses

1. The regret bound is very likely suboptimal. 2. There is no lower bound or discussion of optimality. 3. The “latency” feature mostly amounts to a time-rescaling, I think; it is not clear why this warrants a fundamentally new theory. 4. The experiments lack statistical rigor—no error bars or serious baselines. 5. Overall novelty is modest: the algorithm is a straightforward hybrid of known tools (UCB + stochastic approximation).

Reviewer 02Rating 2Confidence 2

Strengths

The paper has the following strengths: - The paper provides a theoretical formulation by modeling the latency-aware contextual bandit problem as an SMDP and deriving the corresponding Bellman optimality condition. - The paper introduces a contextual online arm filtering (COAF) algorithm based on the derived Bellman condition and establishes regret bounds for both linear and general reward function settings. - The problem is well-motivated by a real-world Cryo-EM application, and the proposed met

Weaknesses

The weaknesses are described below. - Although the paper formulates the latency-aware contextual bandit problem as an MDP, it does not clearly justify why the proposed method is preferable to existing MDP-based solutions. - The arm filtering design and regret analysis follow relatively standard techniques, and the paper does not clearly articulate new analytical challenges introduced by latency or contextual dependencies. - The study focuses solely on the stochastic setting, which can already

Reviewer 03Rating 4Confidence 4

Strengths

The problem setting is clearly motivated with a proper use case of cryo em data collection and is designed to tackle similar use case. The problem formulation has a generalization over contextual bandits, combinatorial semi bandits, which makes it solid. Also, COAF is supported by optimality equation and design with its dependence. The experimentation is supported by real world data to show the working validation of the motivating example. Along with it, they also show their performance on ot

Weaknesses

The setting allows for switching to a new decision sets but don't signify the regime when it is optimal as supposed to exploiting. The experimentation lacks proper baseline to compare the effectiveness of the proposed algorithm COAP. The problem setting has IID assumption with ($X_j$ ,$A_j$ , $l_j$), however this might applications where nonstationary has to dealt with and taken into account.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management

MethodsALIGN