Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training
Siddharth Samsi, Michael Jones, Mark M. Veillette

TL;DR
This paper analyzes the compute, time, and energy costs of training UNet-based neural networks for weather forecasting, demonstrating that mixed-precision training significantly reduces training time and energy consumption while enabling larger models.
Contribution
It introduces a systematic exploration of mixed-precision training for UNet models, showing how to optimize performance and resource usage for weather prediction tasks.
Findings
Mixed-precision training reduces training time significantly.
Larger models with more parameters can be trained with moderate energy increase.
Optimizations enable better performance-cost trade-offs in neural network training.
Abstract
Deep neural networks have shown great success in many diverse fields. The training of these networks can take significant amounts of time, compute and energy. As datasets get larger and models become more complex, the exploration of model architectures becomes prohibitive. In this paper we examine the compute, energy and time costs of training a UNet based deep neural network for the problem of predicting short term weather forecasts (called precipitation Nowcasting). By leveraging a combination of data distributed and mixed-precision training, we explore the design space for this problem. We also show that larger models with better performance come at a potentially incremental cost if appropriate optimizations are used. We show that it is possible to achieve a significant improvement in training time by leveraging mixed-precision training without sacrificing model performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
