LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

Junyu Chen; Junzhuo Li; Zhen Peng; Wenjie Wang; Yuxiang Ren; Long Shi; Xuming Hu

arXiv:2505.18724·cs.LG·September 30, 2025

LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

Junyu Chen, Junzhuo Li, Zhen Peng, Wenjie Wang, Yuxiang Ren, Long Shi, Xuming Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

LoTA-QAF introduces a lossless ternary adaptation method for quantization-aware fine-tuning of large language models, enabling efficient merging of adaptation weights into quantized models and improving performance on downstream tasks.

Contribution

The paper presents a novel lossless ternary adaptation technique that allows all quantized weights to be adjusted and merged without accuracy loss during fine-tuning.

Findings

01

Effectively recovers performance of quantized models on MMLU benchmark.

02

Outperforms 16-bit LoRA in accuracy improvements.

03

Validates effectiveness on multiple LLM families.

Abstract

Quantization and fine-tuning are crucial for deploying large language models (LLMs) on resource-constrained edge devices. However, fine-tuning quantized models presents significant challenges, primarily stemming from: First, the mismatch in data types between the low-precision quantized weights (e.g., 4-bit) and the high-precision adaptation weights (e.g., 16-bit). This mismatch limits the computational efficiency advantage offered by quantized weights during inference. Second, potential accuracy degradation when merging these high-precision adaptation weights into the low-precision quantized weights, as the adaptation weights often necessitate approximation or truncation. Third, as far as we know, no existing methods support the lossless merging of adaptation while adjusting all quantized weights. To address these challenges, we introduce lossless ternary adaptation for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kingdalfgoodman/lota-qaf
pytorchOfficial

Videos

LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning· slideslive

Taxonomy

TopicsMedical Imaging Techniques and Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing