To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM   Training Stability

Joonhyung Lee; Jeongin Bae; Byeongwook Kim; Se Jung Kwon; Dongsoo Lee

arXiv:2405.18710·cs.LG·March 26, 2025·1 cites

To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability

Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

PDF

Open Access

TL;DR

This paper investigates the stability and robustness of FP8 reduced-precision training for large language models, highlighting current limitations and proposing new evaluation methods to guide future research.

Contribution

It introduces new evaluation techniques and a metric for loss landscape sharpness, analyzing the impact of reduced precision on LLM training stability.

Findings

01

FP8 training methods are not yet robust enough for cost-effective use

02

Reduced precision affects training stability across seeds, learning rates, and datasets

03

Simulation of bit reductions reveals the relationship between precision and stability

Abstract

The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent generations of accelerators. This trend has gone even further in the latest processors, where FP8 has recently been introduced. However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8, with even fewer bits than FP16, can be a cost-effective option for LLM training. We argue that reduced-precision training schemes must have similar training stability and hyperparameter sensitivities to their higher-precision counterparts in order to be cost-effective. However, we find that currently available methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuality and Safety in Healthcare