FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic

Kanghyun Choi; Hyeyoon Lee; SunJong Park; Dain Kwon; Jinho Lee

arXiv:2510.24061·cs.LG·October 29, 2025

FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic

Kanghyun Choi, Hyeyoon Lee, SunJong Park, Dain Kwon, Jinho Lee

PDF

1 Video

TL;DR

FALQON is a framework that accelerates LoRA fine-tuning of large language models by directly merging adapters into an FP8-quantized backbone, reducing overhead and achieving significant speedups without sacrificing accuracy.

Contribution

FALQON introduces a novel method to eliminate quantization overhead in LoRA fine-tuning by merging adapters into FP8-quantized models, enabling faster training.

Findings

01

Achieves approximately 3× training speedup over existing quantized LoRA methods.

02

Maintains similar accuracy levels compared to traditional methods.

03

Enables end-to-end FP8 workflow without post-training quantization.

Abstract

Low-bit floating-point (FP) formats, such as FP8, provide significant acceleration and memory savings in model training thanks to native hardware support on modern GPUs and NPUs. However, we analyze that FP8 quantization offers speedup primarily for large-dimensional matrix multiplications, while inherent quantization overheads diminish speedup when applied to low-rank adaptation (LoRA), which uses small-dimensional matrices for efficient fine-tuning of large language models (LLMs). To address this limitation, we propose FALQON, a novel framework that eliminates the quantization overhead from separate LoRA computational paths by directly merging LoRA adapters into an FP8-quantized backbone during fine-tuning. Furthermore, we reformulate the forward and backward computations for merged adapters to significantly reduce quantization overhead, and introduce a row-wise proxy update mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic· slideslive