HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Taiqiang Wu; Chenchen Ding; Wenyong Zhou; Yuxin Cheng; Xincheng Feng; Shuqi Wang; Wendong Xu; Chufan Shi; Zhengwu Liu; Ngai Wong

arXiv:2502.19747·cs.CL·March 10, 2026

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Wendong Xu, Chufan Shi, Zhengwu Liu, Ngai Wong

PDF

Open Access

TL;DR

HaLoRA is a novel hardware-aware low-rank adaptation method that enables energy-efficient deployment of large language models on hybrid compute-in-memory architectures while maintaining high accuracy.

Contribution

The paper introduces HaLoRA, a robust training approach for LoRA branches on CIM architectures, reducing energy costs significantly with minimal accuracy loss.

Findings

01

Achieves up to 22.7% improvement in average score on reasoning tasks.

02

Reduces energy cost to about 3% of Nvidia A100 GPU.

03

Maintains robustness across various noise types and levels.

Abstract

Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method for adapting large language models (LLMs) to downstream tasks. Meanwhile, Compute-in-Memory (CIM) architectures demonstrate superior energy efficiency due to their array-level parallel in-memory computing designs. In this paper, we propose deploying the LoRA-finetuned LLMs on the hybrid CIM architecture (i.e., pretrained weights onto energy-efficient Resistive Random-Access Memory (RRAM) and LoRA branches onto noise-free Static Random-Access Memory (SRAM)), reducing the energy cost to about 3\% compared to the Nvidia A100 GPU. However, the inherent noise of RRAM on the saved weights leads to performance degradation, simultaneously. To address this issue, we design a novel Hardware-aware Low-rank Adaptation (HaLoRA) method. The key insight is to train a LoRA branch that is robust toward such noise and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsLLaMA