LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song, Han, Jiaya Jia

TL;DR
LongLoRA introduces an efficient method for extending the context size of large language models using sparse attention and parameter-efficient fine-tuning, significantly reducing computational costs while maintaining performance.
Contribution
The paper presents a novel approach combining shifted sparse attention and LoRA for efficient long-context fine-tuning of LLMs, enabling large context extension with minimal additional training complexity.
Findings
Extends Llama2 7B from 4k to 100k context length.
Achieves long-context extension on 70B models with only two lines of code.
Maintains similar performance to vanilla attention-based fine-tuning.
Abstract
We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shifted sparse attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of…
Peer Reviews
Decision·ICLR 2024 oral
- The authors propose an extremely simple method, that performs well and is applicable to existing pretrained models
- The authors only evaluate perplexity and retrieval setting
- The proposed method builds on previous work and shows strong empirical results on long lange language modelling and a retrieval task - The proposed approach is conceptually simple and can be implemented in a few lines of code (as demonstrated by the authors) - The proposed approach can be combined with existing approaches for context extension such as positional interpolation - The authors provide a detailed discussion of related work
- The efficiency aspect of the could could be more prominently discussed in the main body of the paper - The presentation of the work could be improved. See below for suggestions
(1) The method seems useful and impactful, and the evaluation is thorough with strong results. (2) The authors perform very thorough ablations and isolate key design decisions (attention shift, modifying the norm & embedding layers) that enable the method to match full fine-tuning. (3) The paper is well-written.
No major weaknesses.
Code & Models
- 🤗Yukang/Llama-2-7b-longlora-8kmodel· 6 dl· ♡ 56 dl♡ 5
- 🤗Yukang/Llama-2-7b-longlora-16kmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗Yukang/Llama-2-7b-longlora-32kmodel· 6 dl· ♡ 76 dl♡ 7
- 🤗Yukang/Llama-2-13b-longlora-8kmodel· 7 dl· ♡ 27 dl♡ 2
- 🤗Yukang/Llama-2-13b-longlora-16kmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗Yukang/Llama-2-70b-longlora-32kmodel· 9 dl· ♡ 189 dl♡ 18
- 🤗Yukang/Llama-2-7b-longlora-8k-ftmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗Yukang/Llama-2-7b-longlora-16k-ftmodel· 730 dl· ♡ 2730 dl♡ 2
- 🤗Yukang/Llama-2-7b-longlora-32k-ftmodel· 747 dl· ♡ 5747 dl♡ 5
- 🤗Yukang/Llama-2-13b-longlora-8k-ftmodel· 6 dl· ♡ 26 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
