Facilitating large language model Russian adaptation with Learned   Embedding Propagation

Mikhail Tikhomirov; Daniil Chernyshev

arXiv:2412.21140·cs.CL·December 31, 2024

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Mikhail Tikhomirov, Daniil Chernyshev

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces Learned Embedding Propagation (LEP), a cost-effective method to adapt large language models to Russian without extensive instruction-tuning, maintaining high performance and reducing data requirements.

Contribution

LEP enables direct embedding of new language knowledge into existing models, bypassing traditional instruction-tuning and lowering data and computational costs.

Findings

01

LEP achieves performance comparable to instruction-tuning methods.

02

LEP improves Russian language adaptation in LLMs with minimal data.

03

Self-calibration further enhances task-solving capabilities.

Abstract

Rapid advancements of large language model (LLM) technologies led to the introduction of powerful open-source instruction-tuned LLMs that have the same text generation quality as the state-of-the-art counterparts such as GPT-4. While the emergence of such models accelerates the adoption of LLM technologies in sensitive-information environments the authors of such models don not disclose the training data necessary for replication of the results thus making the achievements model-exclusive. Since those open-source models are also multilingual this in turn reduces the benefits of training a language specific LLMs as improved inference computation efficiency becomes the only guaranteed advantage of such costly procedure. More cost-efficient options such as vocabulary extension and subsequent continued pre-training are also inhibited by the lack of access to high-quality instruction-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RefalMachine/llmtf_open
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout · Residual Connection · Multi-Head Attention · Adam