Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs

Morgan Lindsay Heisler; Linzi Xing; Ge Shi; Hanieh Sadri; Gursimran Singh; Weiwei Zhang; Tao Ye; Ying Xiong; Yong Zhang; and Zhenan Fan

arXiv:2505.14620·cs.LG·October 3, 2025

Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs

Morgan Lindsay Heisler, Linzi Xing, Ge Shi, Hanieh Sadri, Gursimran Singh, Weiwei Zhang, Tao Ye, Ying Xiong, Yong Zhang, and Zhenan Fan

PDF

TL;DR

This paper introduces Contrastive LoRA Decoding (CoLD), an efficient decoding method that enhances task-specific knowledge utilization in LoRA-adapted large language models, improving accuracy and reducing latency on Ascend NPUs.

Contribution

The paper proposes CoLD, a novel contrastive decoding framework for LoRA models, along with an optimized kernel for Ascend NPUs, boosting performance and efficiency.

Findings

01

Up to 5.54% increase in task accuracy

02

28% reduction in end-to-end latency

03

Effective for resource-constrained environments

Abstract

Huawei Cloud users leverage LoRA (Low-Rank Adaptation) as an efficient and scalable method to fine-tune and customize large language models (LLMs) for application-specific needs. However, tasks that require complex reasoning or deep contextual understanding are often hindered by biases or interference from the base model when using typical decoding methods like greedy or beam search. These biases can lead to generic or task-agnostic responses from the base model instead of leveraging the LoRA-specific adaptations. In this paper, we introduce Contrastive LoRA Decoding (CoLD), a novel decoding framework designed to maximize the use of task-specific knowledge in LoRA-adapted models, resulting in better downstream performance. CoLD uses contrastive decoding by scoring candidate tokens based on the divergence between the probability distributions of a LoRA-adapted expert model and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection · ALIGN