Predicting Training Time Without Training
Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika,, Stefano Soatto

TL;DR
This paper introduces a method to predict the training time of deep networks during fine-tuning by modeling training dynamics with a low-dimensional SDE, significantly reducing computational costs.
Contribution
We propose a novel approach that uses linearized models and SDEs to accurately predict training time without actual training, enabling efficient resource estimation.
Findings
Predicts ResNet training time within 20% error margin
Achieves 30-45 times reduction in computational cost
Can predict training time on large datasets using subset sampling
Abstract
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. We also discuss how to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
MethodsAverage Pooling · 1x1 Convolution · Global Average Pooling · Kaiming Initialization · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling · Residual Block · Bottleneck Residual Block
