LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention
Konstantin Kolomeitsev (Almaty, Kazakhstan)

TL;DR
This paper introduces LLM Modules, a modular architecture enabling knowledge transfer from large to small models via Enhanced Cross-Attention, achieving comparable performance to distillation with limited resources.
Contribution
The paper presents a novel modular approach with Enhanced Cross-Attention for efficient knowledge transfer from large to small language models.
Findings
Achieved comparable response quality to distillation after 15 epochs.
Demonstrated effectiveness on the Bespoke-Stratos-17k dataset.
Showcased advantages of the modular architecture.
Abstract
In this work, we propose an architecture of LLM Modules that enables the transfer of knowledge from a large pre-trained model to a smaller model using an Enhanced Cross-Attention mechanism. In the proposed scheme, the Qwen2-1.5B model is frozen and its representations are passed through specially designed attention layers to the GPT-Neo-125M model, which is trained on limited computational resources. Experimental results on the Bespoke-Stratos-17k dataset demonstrate that after 15 epochs of training, the combined model generates responses comparable in quality to those obtained by distillation. We discuss the advantages of the modular approach, provide examples of input queries and comparative analysis, and outline prospects for further extension of the method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
