zFLoRA: Zero-Latency Fused Low-Rank Adapters

Dhananjaya Gowda; Seoha Song; Harshith Goka; Junhyun Lee

arXiv:2510.25784·cs.CL·October 31, 2025

zFLoRA: Zero-Latency Fused Low-Rank Adapters

Dhananjaya Gowda, Seoha Song, Harshith Goka, Junhyun Lee

PDF

TL;DR

zFLoRA is a novel low-rank adapter method for large language models that achieves zero or negligible latency overhead during inference, outperforming traditional fine-tuning methods on multiple tasks and hardware platforms.

Contribution

The paper introduces zFLoRA, a zero-latency fused low-rank adapter that significantly reduces inference latency while maintaining competitive performance.

Findings

01

zFLoRA achieves zero or negligible latency overhead on NPU and GPU platforms.

02

Experimental results show zFLoRA outperforms LoRA and full fine-tuning on 18 tasks.

03

zFLoRA maintains high accuracy across diverse reasoning and dialogue tasks.

Abstract

Large language models (LLMs) are increasingly deployed with task-specific adapters catering to multiple downstream applications. In such a scenario, the additional compute associated with these apparently insignificant number of adapter parameters (typically less than 1% of the base model) turns out to be disproportionately significant during inference time (upto 2.5x times that of the base model). In this paper, we propose a new zero-latency fused low-rank adapter (zFLoRA) that introduces zero or negligible latency overhead on top of the base model. Experimental results on LLMs of size 1B, 3B and 7B show that zFLoRA compares favorably against the popular supervised fine-tuning benchmarks including low-rank adapters (LoRA) as well as full fine-tuning (FFT). Experiments are conducted on 18 different tasks across three different categories namely commonsense reasoning, math reasoning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.