HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh

TL;DR
HALO introduces a novel quantization-aware training method for LLMs that uses Hadamard rotations to enable accurate, low-precision fine-tuning with significant speedups, maintaining near-full-precision accuracy.
Contribution
The paper presents HALO, a new approach combining Hadamard rotations, high-performance kernels, and FSDP for efficient low-precision LLM training and fine-tuning.
Findings
Achieves near-full-precision accuracy during fine-tuning.
Provides up to 1.41x speedup on RTX 4090 GPUs.
Supports both standard and parameter-efficient fine-tuning.
Abstract
Quantized training of Large Language Models (LLMs) remains an open challenge, as maintaining accuracy while performing all matrix multiplications in low precision has proven difficult. This is particularly the case when fine-tuning pre-trained models, which can have large weight and activation outlier values that make lower-precision optimization difficult. To address this, we present HALO, a novel quantization-aware training approach for Transformers that enables accurate and efficient low-precision training by combining 1) strategic placement of Hadamard rotations in both forward and backward passes, which mitigate outliers, 2) high-performance kernel support, and 3) FSDP integration for low-precision communication. Our approach ensures that all large matrix multiplications during the forward and backward passes are executed in lower precision. Applied to LLAMA-family models, HALO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Iterative Learning Control Systems · VLSI and Analog Circuit Testing
