Training Neural Networks in Single vs Double Precision

Tomas Hrycej; Bernhard Bermeitinger; Siegfried Handschuh

arXiv:2209.07219·cs.LG·November 1, 2022

Training Neural Networks in Single vs Double Precision

Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh

PDF

Open Access

TL;DR

This paper compares the effects of single and double precision arithmetic on neural network training, finding that double precision benefits second-order methods like CG for nonlinear tasks, while first-order methods like RMSprop are unaffected.

Contribution

It provides an empirical evaluation of how computing precision impacts the optimization performance of different neural network training algorithms.

Findings

01

Double precision improves second-order method (CG) performance on nonlinear tasks.

02

First-order methods like RMSprop are unaffected by precision differences.

03

CG with double precision yields better solutions for complex, nonlinear problems.

Abstract

The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Model Reduction and Neural Networks · Adaptive optics and wavefront sensing

MethodsRMSProp