Scalable Decision Focused Learning via Online Trainable Surrogates
Gaetano Signorelli, Michele Lombardi

TL;DR
This paper introduces a scalable decision-focused learning method that uses an efficient surrogate to replace costly loss evaluations, improving training efficiency while maintaining solution quality in complex decision support systems.
Contribution
It proposes a novel surrogate-based acceleration technique for decision-focused learning that is unbiased, confidence-aware, and suitable for black-box optimization models.
Findings
Reduces inner solver calls significantly
Maintains solution quality comparable to state-of-the-art methods
Enables scalable decision-focused learning in complex systems
Abstract
Decision support systems often rely on solving complex optimization problems that may require to estimate uncertain parameters beforehand. Recent studies have shown how using traditionally trained estimators for this task can lead to suboptimal solutions. Using the actual decision cost as a loss function (called Decision Focused Learning) can address this issue, but with a severe loss of scalability at training time. To address this issue, we propose an acceleration method based on replacing costly loss function evaluations with an efficient surrogate. Unlike previously defined surrogates, our approach relies on unbiased estimators reducing the risk of spurious local optima and can provide information on its local confidence allowing one to switch to a fallback method when needed. Furthermore, the surrogate is designed for a black-box setting, which enables compensating for…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper combines multiple statistical tools in interesting ways to approximate the regret landscapes. It appears that the method can outperform most of the baselines in terms of regret/function calls tradeoff.
There is some convincing that needs to be done to justify this method. Importantly, while it performs well empirically on the experiments, there is a lack of real-world data experiments as they all seem to be synthetically generated. For a practical method that is meant to improve computational performance, it ought to be demonstrated in practice. Secondly, there are no theoretical properties of the surrogate loss function. It would be desirable to have consistency properties of the surrogate lo
Surrogate methods are attractive because they are agnostic to the specific kind of optimization problem at hand, and this class of methods has attracted several attempts in recent years. This paper appears to improve fairly uniformly over them. I am particularly convinced by the results in the appendix that EGL (the best baseline) has inconsistent performance across its different variants, while the proposed method lacks this hyperparameter (the loss function family) and performs consistently we
Arguably, this paper is mostly a direct modification of previous ideas: make the family for the surrogate a GP instead of previously proposed parametric families. This is not necessarily a bad thing -- simple ideas that are easily implementable and improve performance can be valuable -- but it places more emphasis on the empirical results being robust. I think that the results right now are short of a fully convincing picture but that this is very fixable with some new experiments. First, the r
- The paper tackles an important and practical challenge: DFL’s poor training scalability due to repeated solver invocations, and try to build on a previous approach which also tackle the DFL problems by learning surrogate regret losses. - Experiments are thorough, and the code has been released in the supplemental material.
- Core Idea. The method replaces the original and computational expensive regret loss for each training instance with its own trainable GP surrogate, which is repeatedly refined by evaluating that same expensive loss whenever confidence is low. This design essentially amounts to a brute-force caching or regression scheme over previously computed losses - Lack of generalization. Training an independent surrogate per sample prevents information sharing and limits the method’s ability to generaliz
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
