Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
Harry Amad, Mihaela van der Schaar

TL;DR
This paper introduces Hyperparameter Trajectory Inference (HTI) using conditional Lagrangian optimal transport to model how neural network outputs change with hyperparameters, enabling efficient adaptation without retraining.
Contribution
It develops a novel method combining conditional optimal transport and Lagrangian dynamics to accurately infer neural network behavior across hyperparameters.
Findings
Outperforms existing methods in reconstructing NN outputs across hyperparameter ranges.
Incorporates manifold hypothesis and least-action principles to enhance surrogate model feasibility.
Demonstrates effectiveness on various neural network models and hyperparameter settings.
Abstract
Neural networks (NNs) often have critical behavioural trade-offs that are set at design time with hyperparameters-such as reward weights in reinforcement learning or quantile targets in regression. Post-deployment, however, user preferences can evolve, making initial settings undesirable, necessitating potentially expensive retraining. To circumvent this, we introduce the task of Hyperparameter Trajectory Inference (HTI): to learn, from observed data, how a NN's conditional output distribution changes with its hyperparameters, and construct a surrogate model that approximates the NN at unobserved hyperparameter settings. HTI requires extending existing trajectory inference approaches to incorporate conditions, exacerbating the challenge of ensuring inferred paths are feasible. We propose an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian…
Peer Reviews
Decision·ICLR 2026 Oral
- HTI is a useful abstraction: many hyperparameters in practice (reward weights, discount factors, quantile levels, robustness coefficients) define families of policies or predictors, but current practice trains only a few points on that curve. Framing this as conditional trajectory inference and tying it to OT provides a principled way to discuss “hyperparameter-induced dynamics.” - The authors do not simply make their approach learn pairwise conditional OT maps. It learns the cost itself via a
- ... but all (few) real evaluation tasks are fairly low-dimensional on the output side, forecasts with short horizons. The method is sold as applicable to “complex and higher-dimensional geometries,” but the experiments do not necessarily convey this statement. - The RL scenarios considered are in fact well-behaved; the parameter $\lambda$ yields a linear combination of "objectives" (e.g., main reward+penalty/cost). In such cases, it is widely known that the resulting trade-offs from tuning $\l
- The paper does a good job of setting up its methodology, with clear explanations of each piece. - The problem formulation developed here seems new and is well motivated. - The use of learning a conditional Lagrangian Optimal Transport model is novel and elegant. - The authors give a range of experiments that demonstrate the ability of this method to learn hyperparameter trajectories.
- It is hard to see what the real use of this method will be, and the message of this gets somewhat lost in the experiments. While this is a cool idea, what are the applications in which this will be most relevant, especially when one doesn't know a priori how chaotic the trajectories will be? - There are no baselines that this work is able to compare against. - While the results indicate that the method can do a good job of modeling trajectories with proper tuning, I am missing what insights w
There are 2 main strong points with this submission. First, the authors do a nice link in their experiments section, with their motivations in the introduction. The reasoning for why one needs hyper-parameter inference are also clear and well motivated. Second, there is a good variety of important tasks for which the proposed method applies. In all tasks the authors show an improvement over other methods, sometimes with a significant margin.
Overall, I think this paper has major organization issues: 1) Section 4 is way too short to merit being a section on its own (2 paragraphs), and I think the authors define the problem they are treating way too late in the paper. In my view, this should be done as early as possible (e.g., in the beginning preliminaries section). 2) With `1)` in mind, the preliminaries section seem disconnected with the problem statement. It could be nice to highlight how conditional OT/lagrangian OT relate to t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference
