LLM Modules: Knowledge Transfer from a Large to a Small Model using   Enhanced Cross-Attention

Konstantin Kolomeitsev (Almaty; Kazakhstan)

arXiv:2502.08213·cs.CL·February 13, 2025

LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention

Konstantin Kolomeitsev (Almaty, Kazakhstan)

PDF

Open Access 1 Models

TL;DR

This paper introduces LLM Modules, a modular architecture enabling knowledge transfer from large to small models via Enhanced Cross-Attention, achieving comparable performance to distillation with limited resources.

Contribution

The paper presents a novel modular approach with Enhanced Cross-Attention for efficient knowledge transfer from large to small language models.

Findings

01

Achieved comparable response quality to distillation after 15 epochs.

02

Demonstrated effectiveness on the Bespoke-Stratos-17k dataset.

03

Showcased advantages of the modular architecture.

Abstract

In this work, we propose an architecture of LLM Modules that enables the transfer of knowledge from a large pre-trained model to a smaller model using an Enhanced Cross-Attention mechanism. In the proposed scheme, the Qwen2-1.5B model is frozen and its representations are passed through specially designed attention layers to the GPT-Neo-125M model, which is trained on limited computational resources. Experimental results on the Bespoke-Stratos-17k dataset demonstrate that after 15 epochs of training, the combined model generates responses comparable in quality to those obtained by distillation. We discuss the advantages of the modular approach, provide examples of input queries and comparative analysis, and outline prospects for further extension of the method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
kkolomeitsev/llm-modules
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications