Scalable Decision Focused Learning via Online Trainable Surrogates

Gaetano Signorelli; Michele Lombardi

arXiv:2512.03861·cs.LG·December 19, 2025

Scalable Decision Focused Learning via Online Trainable Surrogates

Gaetano Signorelli, Michele Lombardi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a scalable decision-focused learning method that uses an efficient surrogate to replace costly loss evaluations, improving training efficiency while maintaining solution quality in complex decision support systems.

Contribution

It proposes a novel surrogate-based acceleration technique for decision-focused learning that is unbiased, confidence-aware, and suitable for black-box optimization models.

Findings

01

Reduces inner solver calls significantly

02

Maintains solution quality comparable to state-of-the-art methods

03

Enables scalable decision-focused learning in complex systems

Abstract

Decision support systems often rely on solving complex optimization problems that may require to estimate uncertain parameters beforehand. Recent studies have shown how using traditionally trained estimators for this task can lead to suboptimal solutions. Using the actual decision cost as a loss function (called Decision Focused Learning) can address this issue, but with a severe loss of scalability at training time. To address this issue, we propose an acceleration method based on replacing costly loss function evaluations with an efficient surrogate. Unlike previously defined surrogates, our approach relies on unbiased estimators reducing the risk of spurious local optima and can provide information on its local confidence allowing one to switch to a fallback method when needed. Furthermore, the surrogate is designed for a black-box setting, which enables compensating for…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 2

Strengths

This paper combines multiple statistical tools in interesting ways to approximate the regret landscapes. It appears that the method can outperform most of the baselines in terms of regret/function calls tradeoff.

Weaknesses

There is some convincing that needs to be done to justify this method. Importantly, while it performs well empirically on the experiments, there is a lack of real-world data experiments as they all seem to be synthetically generated. For a practical method that is meant to improve computational performance, it ought to be demonstrated in practice. Secondly, there are no theoretical properties of the surrogate loss function. It would be desirable to have consistency properties of the surrogate lo

Reviewer 02Rating 4Confidence 4

Strengths

Surrogate methods are attractive because they are agnostic to the specific kind of optimization problem at hand, and this class of methods has attracted several attempts in recent years. This paper appears to improve fairly uniformly over them. I am particularly convinced by the results in the appendix that EGL (the best baseline) has inconsistent performance across its different variants, while the proposed method lacks this hyperparameter (the loss function family) and performs consistently we

Weaknesses

Arguably, this paper is mostly a direct modification of previous ideas: make the family for the surrogate a GP instead of previously proposed parametric families. This is not necessarily a bad thing -- simple ideas that are easily implementable and improve performance can be valuable -- but it places more emphasis on the empirical results being robust. I think that the results right now are short of a fully convincing picture but that this is very fixable with some new experiments. First, the r

Reviewer 03Rating 2Confidence 3

Strengths

- The paper tackles an important and practical challenge: DFL’s poor training scalability due to repeated solver invocations, and try to build on a previous approach which also tackle the DFL problems by learning surrogate regret losses. - Experiments are thorough, and the code has been released in the supplemental material.

Weaknesses

- Core Idea. The method replaces the original and computational expensive regret loss for each training instance with its own trainable GP surrogate, which is repeatedly refined by evaluating that same expensive loss whenever confidence is low. This design essentially amounts to a brute-force caching or regression scheme over previously computed losses - Lack of generalization. Training an independent surrogate per sample prevents information sharing and limits the method’s ability to generaliz

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics