You Only Train Once

Christos Sakaridis

arXiv:2506.04349·cs.LG·June 6, 2025

You Only Train Once

Christos Sakaridis

PDF

Open Access

TL;DR

This paper introduces YOTO, a method that automatically optimizes loss hyperparameters in a single training run using gradient-based methods, streamlining the training process and improving performance in computer vision tasks.

Contribution

YOTO is a novel approach that treats loss hyperparameters as learnable parameters, enabling one-shot optimization through differentiable composite loss modeling and regularization.

Findings

01

YOTO outperforms grid search in 3D estimation tasks.

02

YOTO achieves better generalization on unseen data.

03

The method is effective for semantic segmentation as well.

Abstract

The title of this paper is perhaps an overclaim. Of course, the process of creating and optimizing a learned model inevitably involves multiple training runs which potentially feature different architectural designs, input and output encodings, and losses. However, our method, You Only Train Once (YOTO), indeed contributes to limiting training to one shot for the latter aspect of losses selection and weighting. We achieve this by automatically optimizing loss weight hyperparameters of learned models in one shot via standard gradient-based optimization, treating these hyperparameters as regular parameters of the networks and learning them. To this end, we leverage the differentiability of the composite loss formulation which is widely used for optimizing multiple empirical losses simultaneously and model it as a novel layer which is parameterized with a softmax operation that satisfies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Domain Adaptation and Few-Shot Learning

MethodsSoftmax