Managing Solution Stability in Decision-Focused Learning with Cost Regularization
Victor Spitzer, Francois Sanson

TL;DR
This paper enhances decision-focused learning by analyzing solution stability issues caused by perturbation fluctuations and proposing cost regularization to improve robustness and training effectiveness.
Contribution
It introduces a theoretical link between perturbation fluctuations and solution stability, and proposes a cost regularization method to improve learning robustness.
Findings
Regularization improves decision quality in experiments
Fluctuations in perturbation intensity affect training stability
Proposed method enhances robustness of decision-focused models
Abstract
Decision-focused learning integrates predictive modeling and combinatorial optimization by training models to directly improve decision quality rather than prediction accuracy alone. Differentiating through combinatorial optimization problems represents a central challenge, and recent approaches tackle this difficulty by introducing perturbation-based approximations. In this work, we focus on estimating the objective function coefficients of a combinatorial optimization problem. Our study demonstrates that fluctuations in perturbation intensity occurring during the learning phase can lead to ineffective training, by establishing a theoretical link to the notion of solution stability in combinatorial optimization. We propose addressing this issue by introducing a regularization of the estimated cost vectors which improves the robustness and reliability of the learning process, as…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper provides a meaningful conceptual clarification and a practical normalization mechanism that helps stabilize a widely used—but often fragile—class of DFL methods. The insight linking stability radius with learning dynamics is both useful and broadly relevant.
1. I found some notations and definitions are not rigorous in the paper, see Questions. 2. The paper lacks discussion on other perturbed optimizers beyond the MILP case.
1. The writing is generally clear. 2. The viewpoint of interpreting different decision-focused learning methods through the concept of solution stability is novel and interesting.
1. The introduction and explanation of the four properties in Section 3 could be clearer; adding examples may aid understanding. 2. There are some typos—for example, inconsistent capitalization of the initial letter in “property.”
I think this work makes a very good case for how controlling the scale of perturbations (or of sampling processes) can dramatically affect the behavior of DFL training methods that makes use of such idea (which are at this point many and among the best performers). The discussion on how different classes of method become either ineffective, or collapse to solution imitation, is well done and convincing, even if somewhat informal. I also believe that the proposed normalization technique can be
The key issue I see in this work is that the proposed approach does not appear to address the analyzed problem. Based on the formulation from eq. (19), the normalization mapping is applied to the parameter vector just before it is fed to the optimization process (the f mapping). In a perturbation based approach, this means that normalization would be applied to the perturbed parameters, after the scale mismatch as already done all the damage extensively documented in the first half of the paper.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
