TL;DR
This paper presents a method to adapt English GPT-2 models to Italian and Dutch by retraining lexical embeddings, enabling realistic sentence generation in these languages with minimal additional training.
Contribution
It introduces a novel lexical embedding adaptation technique that allows repurposing English GPT-2 models for other languages without full retraining.
Findings
Generated sentences in Italian and Dutch are realistic and comparable to fully trained models.
The method minimizes training time and preserves original model knowledge.
Relearned lexical embeddings align with original English embeddings.
Abstract
Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗GroNLP/gpt2-medium-dutch-embeddingsmodel· 964 dl· ♡ 3964 dl♡ 3
- 🤗GroNLP/gpt2-medium-italian-embeddingsmodel· 23 dl· ♡ 323 dl♡ 3
- 🤗GroNLP/gpt2-small-dutch-embeddingsmodel· 30 dl· ♡ 230 dl♡ 2
- 🤗GroNLP/gpt2-small-dutchmodel· 15k dl· ♡ 615k dl♡ 6
- 🤗GroNLP/gpt2-small-italian-embeddingsmodel· 15 dl· ♡ 115 dl♡ 1
- 🤗GroNLP/gpt2-small-italianmodel· 5.0k dl· ♡ 135.0k dl♡ 13
- 🤗RichardErkhov/GroNLP_-_gpt2-medium-italian-embeddings-4bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/GroNLP_-_gpt2-medium-italian-embeddings-8bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/GroNLP_-_gpt2-medium-italian-embeddings-ggufmodel· 111 dl111 dl
- 🤗RichardErkhov/GroNLP_-_gpt2-small-dutch-4bitsmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Label Smoothing · Adam · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Attention Is All You Need
