Compute, Time and Energy Characterization of Encoder-Decoder Networks   with Automatic Mixed Precision Training

Siddharth Samsi; Michael Jones; Mark M. Veillette

arXiv:2008.08062·cs.DC·September 8, 2021

Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training

Siddharth Samsi, Michael Jones, Mark M. Veillette

PDF

TL;DR

This paper analyzes the compute, time, and energy costs of training UNet-based neural networks for weather forecasting, demonstrating that mixed-precision training significantly reduces training time and energy consumption while enabling larger models.

Contribution

It introduces a systematic exploration of mixed-precision training for UNet models, showing how to optimize performance and resource usage for weather prediction tasks.

Findings

01

Mixed-precision training reduces training time significantly.

02

Larger models with more parameters can be trained with moderate energy increase.

03

Optimizations enable better performance-cost trade-offs in neural network training.

Abstract

Deep neural networks have shown great success in many diverse fields. The training of these networks can take significant amounts of time, compute and energy. As datasets get larger and models become more complex, the exploration of model architectures becomes prohibitive. In this paper we examine the compute, energy and time costs of training a UNet based deep neural network for the problem of predicting short term weather forecasts (called precipitation Nowcasting). By leveraging a combination of data distributed and mixed-precision training, we explore the design space for this problem. We also show that larger models with better performance come at a potentially incremental cost if appropriate optimizations are used. We show that it is possible to achieve a significant improvement in training time by leveraging mixed-precision training without sacrificing model performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.