Facilitating large language model Russian adaptation with Learned Embedding Propagation
Mikhail Tikhomirov, Daniil Chernyshev

TL;DR
This paper introduces Learned Embedding Propagation (LEP), a cost-effective method to adapt large language models to Russian without extensive instruction-tuning, maintaining high performance and reducing data requirements.
Contribution
LEP enables direct embedding of new language knowledge into existing models, bypassing traditional instruction-tuning and lowering data and computational costs.
Findings
LEP achieves performance comparable to instruction-tuning methods.
LEP improves Russian language adaptation in LLMs with minimal data.
Self-calibration further enhances task-solving capabilities.
Abstract
Rapid advancements of large language model (LLM) technologies led to the introduction of powerful open-source instruction-tuned LLMs that have the same text generation quality as the state-of-the-art counterparts such as GPT-4. While the emergence of such models accelerates the adoption of LLM technologies in sensitive-information environments the authors of such models don not disclose the training data necessary for replication of the results thus making the achievements model-exclusive. Since those open-source models are also multilingual this in turn reduces the benefits of training a language specific LLMs as improved inference computation efficiency becomes the only guaranteed advantage of such costly procedure. More cost-efficient options such as vocabulary extension and subsequent continued pre-training are also inhibited by the lack of access to high-quality instruction-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗RefalMachine/ruadapt_qwen2.5_3B_ext_u48_instruct_v4model· 235 dl· ♡ 30235 dl♡ 30
- 🤗RefalMachine/ruadapt_qwen2.5_3B_ext_u48_instruct_v4_ggufmodel· 1.5k dl· ♡ 151.5k dl♡ 15
- 🤗RefalMachine/ruadapt_qwen2.5_7B_ext_u48_instructmodel· 16 dl· ♡ 716 dl♡ 7
- 🤗RefalMachine/ruadapt_qwen2.5_7B_ext_u48_instruct_ggufmodel· 376 dl· ♡ 9376 dl♡ 9
- 🤗msu-rcc-lair/RuadaptQwen2.5-32B-Instructmodel· 63 dl· ♡ 4863 dl♡ 48
- 🤗msu-rcc-lair/RuadaptQwen2.5-32B-instruct-GGUFmodel· 119 dl· ♡ 18119 dl♡ 18
- 🤗RefalMachine/RuadaptQwen2.5-1.5B-instructmodel· 102 dl· ♡ 8102 dl♡ 8
- 🤗RefalMachine/RuadaptQwen2.5-14B-Instructmodel· 65 dl· ♡ 565 dl♡ 5
- 🤗RefalMachine/RuadaptQwen2.5-14B-instruct-GGUFmodel· 266 dl· ♡ 6266 dl♡ 6
- 🤗RefalMachine/RuadaptQwen2.5-14B-Instruct-1Mmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout · Residual Connection · Multi-Head Attention · Adam
