RE-Adapt: Reverse Engineered Adaptation of Large Language Models

William Fleshman; Benjamin Van Durme

arXiv:2405.15007·cs.CL·May 27, 2024

RE-Adapt: Reverse Engineered Adaptation of Large Language Models

William Fleshman, Benjamin Van Durme

PDF

Open Access 4 Reviews

TL;DR

RE-Adapt is a novel method for domain-specific fine-tuning of large language models that preserves instruction capabilities without additional data or training, outperforming existing methods across various models and datasets.

Contribution

It introduces a data-free, reverse engineering approach to adapt large language models to new domains while maintaining instruction-following abilities.

Findings

01

RE-Adapt outperforms other fine-tuning methods.

02

It works across multiple large language models.

03

It maintains instruction-following without additional data.

Abstract

We introduce RE-Adapt, an approach to fine-tuning large language models on new domains without degrading any pre-existing instruction-tuning. We reverse engineer an adapter which isolates what an instruction-tuned model has learned beyond its corresponding pretrained base model. Importantly, this requires no additional data or training. We can then fine-tune the base model on a new domain and readapt it to instruction following with the reverse engineered adapter. RE-Adapt and our low-rank variant LoRE-Adapt both outperform other methods of fine-tuning, across multiple popular LLMs and datasets, even when the models are used in conjunction with retrieval-augmented generation.

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 3

Strengths

1. The paper is well-motivated, highlighting the challenge of catastrophic forgetting when adapting current LLMs to new domains or distributions. The RE-Adapt method provides a practical solution to this issue. 2. The LoRE-Adapt method, based on LoRA, demonstrates lower memory usage compared to RE-Adapt, which is significant for training efficiency, building on the inherent advantages of low-rank approximation. 3. The proposed methods are compatible with retrieval-augmented generation, showcasin

Weaknesses

1. The applicability of the proposed method may be limited, as it relies on an instruction-tuned version of the original foundation model. While many open-source LLMs offer both versions, it remains unclear how RE-Adapt would perform if only the foundation model is available and we attempt to tune a weaker instruction-following model. 2. On line 201, the authors state that “any fine-tuning approach is applicable”, which is misleading. Directly employing a full-parameter fine-tuning method would

Reviewer 02Rating 5Confidence 4

Strengths

- The paper addresses the important challenge of adding new knowledges to LLMs while preserving their instruction-following capabilities. - The method is well-grounded and aligns with existing literature on LLM arithmetic, leveraging the modular nature of model weights for adaptation.

Weaknesses

- The experiments focus exclusively on question-answering (QA) tasks and do not assess whether the instruction-following capabilities of instruction-tuned models are maintained post-adaptation. Additionaly experiments on common benchmarks for instruction-tuned models are expected. - A comparative analysis between the original instruction adapter obtained via reversed engineering and the instruction adapter after further domain-specific pretraining and instruction fine-tuning could help validate

Reviewer 03Rating 3Confidence 4

Strengths

- The proposed method is straightforward, simple, and clear, making it easy for readers to understand. Additionally, the authors’ writing is fluent and well-executed. - The authors have identified a problem that, while not highly focused on, is still of practical significance: the mismatch between instruction models and domain-specific downstream data.

Weaknesses

- Lack of Novelty: The methods proposed by the authors, namely RE Adapt and LoRE-Adapt, are based on well-established concepts such as adding independent adapters to the model, utilizing Singular Value De composition (SVD) to extract key information from matrices, and com bining adapters through weighted averaging. These methods were widely accepted and recognized by the academic community as early as 2021 with the introduction of LoRA. In the paper, the authors do not present any novel insights

Reviewer 04Rating 3Confidence 5

Strengths

* A highly-relevant research direction of adapting LLMs to new domains * An interesting idea of constructing domain-adapted instructed models by task arithmetic. The simplicity of the method makes it easily applicable in practice to any domain of interest. * The paper is well-written and easy to follow * Well-structured and comprehensive Related Work * The method is validated for three different LLMs

Weaknesses

1. Insufficient empirical validation of the proposed approach. - The paper conducts experiments on a single task (question answering), with two adaptation datasets (StreamingQA and RetrievalQA) and one general QA dataset (Natural Questions). Since the main goal of the proposed approach is to build a model capable of following instructions, a validation of the final model on a broader set of tasks is required, e.g. including open-ended instruction following, summarization, or multiple-choic

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsBalanced Selection · Adapter