Revisiting 16-bit Neural Network Training: A Practical Approach for Resource-Limited Learning

Juyoung Yun; Sol Choi; Francois Rameau; Byungkon Kang; Zhoulai Fu

arXiv:2305.10947·cs.LG·April 20, 2026·1 cites

Revisiting 16-bit Neural Network Training: A Practical Approach for Resource-Limited Learning

Juyoung Yun, Sol Choi, Francois Rameau, Byungkon Kang, Zhoulai Fu

PDF

TL;DR

This paper rigorously validates that 16-bit precision neural networks can match 32-bit accuracy, offering a practical resource-efficient training approach validated through theoretical analysis and extensive experiments.

Contribution

It provides the first systematic validation and theoretical formalization showing 16-bit precision can achieve comparable results to 32-bit in neural network training.

Findings

01

16-bit neural networks match 32-bit accuracy

02

Theoretical analysis explains floating-point errors in 16-bit precision

03

16-bit training boosts computational speed on GPUs

Abstract

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during model training and inference to optimize resource usage, have been widely adopted. However, access to hardware that supports lower precision formats (e.g., FP8 or FP4) remains limited, especially for practitioners with hardware constraints. For many with limited resources, the available options are restricted to using 32-bit, 16-bit, or a combination of the two. While it is commonly believed that 16-bit precision can achieve results comparable to full (32-bit) precision, this study is the first to systematically validate this assumption through both rigorous theoretical analysis and extensive empirical evaluation. Our theoretical formalization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.