Online Decision-Focused Learning
Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain Durmus

TL;DR
This paper develops online algorithms for decision-focused learning in dynamic environments, addressing challenges of non-convexity and non-differentiability, and provides theoretical regret bounds with empirical validation.
Contribution
It introduces the first provably guaranteed online algorithms for decision-focused learning in evolving environments, handling non-convex and non-differentiable objectives.
Findings
Algorithms outperform benchmarks in knapsack experiments
Established static and dynamic regret bounds
Effective handling of non-convex, non-differentiable objectives
Abstract
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients, which prevents the use of standard first-order optimization methods, and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along…
Peer Reviews
Decision·ICLR 2026 Poster
This is an excellent paper! * The paper is very well written and includes intuitive explanations along with precise mathematical statements. * The paper tackles a fundamental problem: online decision-focused learning, where the goal is to train predictive models not just for prediction but for using those predictions in a downstream decision-making task. While prior work looked at this problem in the offline setting, this paper studies this problem in the online setting, provides provable regr
Please see the questions section some weaknesses/questions.
Originality: The originality arises from a proper problem formulation for the regret analysis of decision-focused online optimization, incorporating static and dynamic regret analysis into the said problem and removing limitations of non-differentiability and non-convexity. Quality: The submission seems technically correct, experimentally rigorous and reproducible (except minor caveats in the algorithms). Clarity: The submission is mostly clear. Significance: The submission presents theoretic
I did not notice any substantial weakness, so I am leaning towards acceptance. One thing to note is that the exact challenge in achieving the said results could be emphasized. Possibly, after proper formulation, everything follows from the existing regret analysis techniques.
* The paper extends the decision-focused framework to online learning, whereas most of the existing literature on the topic is limited to the batch case, where iid samples and a fixed estimation/optimization problem is available. * The authors provide a complete analysis of the problem leveraging state-of-the-art technical algorithms and tools from online learning literature.
* The technical novelty of the paper is limited. While the authors stress the challenges posed by the online decision-focused learning setting (non-differentiable and non-convex functions), once formulated as in eq.(3) the problem is amenable to any "standard" online learning treatment. Indeed most of the results in the paper are obtained by carefully instantiating known assumptions, algorithms, and theoretical results, such as assumptions H1/H2, FTL and OGD, regularization and approximate oracl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline and Blended Learning
MethodsFocus
