Interweaving Memories of a Siamese Large Language Model

Xin Song; Zhikai Xue; Guoxiu He; Jiawei Liu; Wei Lu

arXiv:2412.17383·cs.CL·December 24, 2024

Interweaving Memories of a Siamese Large Language Model

Xin Song, Zhikai Xue, Guoxiu He, Jiawei Liu, Wei Lu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces IMSM, a model-agnostic PEFT framework for LLMs that interweaves original and fine-tuned memories to improve task performance and reduce catastrophic forgetting, with broad applicability and efficiency.

Contribution

IMSM is a novel PEFT framework that interweaves original and fine-tuned memories in a siamese LLM, enhancing knowledge retention and task performance.

Findings

01

IMSM significantly outperforms classical PEFT methods in benchmark tasks.

02

IMSM maintains efficiency comparable to backbone PEFT methods.

03

IMSM effectively mitigates catastrophic forgetting in LLMs.

Abstract

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting, where LLMs prioritize new knowledge at the expense of comprehensive world knowledge. A promising approach to mitigate this issue is to recall prior memories based on the original knowledge. To this end, we propose a model-agnostic PEFT framework, IMSM, which Interweaves Memories of a Siamese Large Language Model. Specifically, our siamese LLM is equipped with an existing PEFT method. Given an incoming query, it generates two distinct memories based on the pre-trained and fine-tuned parameters. IMSM then incorporates an interweaving mechanism that regulates the contributions of both original and enhanced memories when generating the next token. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecnu-text-computing/imsm
pytorchOfficial

Videos

Interweaving Memories of a Siamese Large Language Model· underline

Taxonomy

TopicsSoutheast Asian Sociopolitical Studies · Multilingual Education and Policy