The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Juyoung Yun; Byungkon Kang; Zhoulai Fu

arXiv:2301.12809·cs.LG·May 6, 2024·1 cites

The Hidden Power of Pure 16-bit Floating-Point Neural Networks

Juyoung Yun, Byungkon Kang, Zhoulai Fu

PDF

Open Access

TL;DR

This paper demonstrates that pure 16-bit neural networks can outperform 32-bit models in classification tasks, supported by extensive experiments and theoretical analysis, challenging the assumption that lower precision always harms performance.

Contribution

It is the first comprehensive study of pure 16-bit neural networks, showing their unexpected advantages over 32-bit models through experiments and theoretical insights.

Findings

01

Pure 16-bit networks outperform 32-bit in certain classification tasks

02

Theoretical analysis supports empirical performance gains

03

Low-precision training can be detrimental in some scenarios

Abstract

Lowering the precision of neural networks from the prevalent 32-bit precision has long been considered harmful to performance, despite the gain in space and time. Many works propose various techniques to implement half-precision neural networks, but none study pure 16-bit settings. This paper investigates the unexpected performance gain of pure 16-bit neural networks over the 32-bit networks in classification tasks. We present extensive experimental results that favorably compare various 16-bit neural networks' performance to those of the 32-bit models. In addition, a theoretical analysis of the efficiency of 16-bit models is provided, which is coupled with empirical evidence to back it up. Finally, we discuss situations in which low-precision training is indeed detrimental.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Neural Networks and Reservoir Computing

MethodsNone