CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient   Multi-Tasking on Personal Devices

Weilin Zhao; Yuxiang Huang; Xu Han; Zhiyuan Liu; Zhengyan Zhang; Kuai; Li; Chen Chen; Tao Yang; Maosong Sun

arXiv:2307.07705·cs.CL·August 8, 2024

CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices

Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Kuai, Li, Chen Chen, Tao Yang, Maosong Sun

PDF

Open Access 1 Repo

TL;DR

CA-LoRA is a novel framework that adapts existing LoRA modules to compressed LLMs, enabling efficient multi-tasking on resource-limited personal devices by recovering lost knowledge and maintaining high performance.

Contribution

This paper introduces CA-LoRA, a method that effectively adapts existing LoRAs to compressed LLMs, improving multi-task performance on personal devices.

Findings

01

CA-LoRA outperforms vanilla LoRA on compressed LLMs.

02

CA-LoRA achieves performance comparable to non-compressed LLMs with LoRA.

03

Knowledge inheritance and recovery strategies are effective in mitigating compression loss.

Abstract

Recently, there has been a demand to deploy Large Language Models (LLMs) on personal devices such as laptops and smartphones. These LLMs have different model variants when handling different tasks. However, personal devices have limited resources and require reduced storage overhead. To address this, there are two key methods available: the first is model compression, which compresses LLMs into smaller sizes; the second is LoRA, which can transfer an LLM to other tasks with very few parameters, avoiding the storage of multiple model variants in multi-task scenarios by only preserving LoRAs. However, our experiments show that directly combining these two methods yields sub-optimal performance. Considering that the open-source community has already contributed many LoRAs to LLMs, we propose to adapt these existing LoRAs from the LLMs to their compressed version and introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/ca-lora
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Ferroelectric and Negative Capacitance Devices