SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training
Yunjie Pan, Yongyi Yang, Hanmei Yang, Scott Mahlke

TL;DR
SNIP is an adaptive mixed-precision training framework for large language models that optimizes layerwise precision to reduce computational cost while maintaining model quality, especially with subbyte precision support.
Contribution
It introduces a novel adaptive precision optimization method using metrics and ILP to improve training efficiency and stability for large language models.
Findings
Reduces FLOPs by up to 80% compared to baselines.
Maintains model quality across various model sizes and training phases.
Supports subbyte precision with minimal overhead.
Abstract
Training large language models (LLMs) efficiently while preserving model quality poses significant challenges, particularly with subbyte precision supported by state-of-the-art GPUs. Current mixed-precision training approaches either apply uniform precision to all GEMM operations or rely on heuristic-based methods that fail to generalize during training, leading to suboptimal convergence and instability. To address these challenges, this paper introduces SNIP, a fine-grained adaptive mixed-precision training framework for LLM pretraining that supports subbyte precision. SNIP periodically collects statistics on activations, gradients, and optimizer states to assess the precision loss impact on model quality. We define two key metrics: loss divergence in the forward pass, caused by quantization-induced increases in training loss, and weight divergence in the backward pass, which measures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
