Carbon Emissions and Large Neural Network Training

David Patterson; Joseph Gonzalez; Quoc Le; Chen Liang; Lluis-Miquel; Munguia; Daniel Rothchild; David So; Maud Texier; Jeff Dean

arXiv:2104.10350·cs.LG·April 26, 2021·130 cites

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel, Munguia, Daniel Rothchild, David So, Maud Texier, Jeff Dean

PDF

Open Access

TL;DR

This paper assesses the energy consumption and carbon footprint of large neural networks, highlighting opportunities for efficiency improvements and advocating for transparency and standardized metrics in ML research.

Contribution

It provides detailed estimates of energy use for recent large models, identifies key factors affecting carbon emissions, and promotes transparency and benchmarking of energy metrics in ML.

Findings

01

Sparse DNNs can reduce energy consumption by over 90% without accuracy loss.

02

Location and infrastructure significantly impact energy efficiency and CO2 emissions.

03

Optimizing model, data center, and hardware choices can reduce carbon footprint by up to 1000X.

Abstract

The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)

MethodsLinear Layer · Switch FFN · Switch Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Meena · GShard · Softmax · Layer Normalization · Label Smoothing