GEITje 7B Ultra: A Conversational Model for Dutch
Bram Vanroy

TL;DR
This paper introduces GEITje 7B Ultra, a Dutch conversational language model built through supervised finetuning and preference alignment on synthetic datasets, enhancing Dutch NLP capabilities.
Contribution
The paper presents an improved Dutch conversational model, GEITje 7B Ultra, developed with new synthetic datasets and alignment techniques, expanding multilingual NLP resources.
Findings
Enhanced Dutch conversational abilities demonstrated
Open availability of models and datasets
Effective use of synthetic data for model training
Abstract
Language models have rapidly evolved, predominantly focusing on English while often neglecting extensive pretraining in other languages. This approach has required initiatives to adapt powerful, English-centric models to other linguistic contexts through finetuning. For Dutch, such a recent endeavour is ``GEITje'' a model originally derived from the English-based Mistral 7B. Building on this fundamental work, the current research extends the capabilities of GEITje by supervised finetuning on newly created high-quality synthetic conversational datasets, along with an additional preference alignment procedure on a synthetic feedback dataset. Both the developed models and the created datasets are openly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BramVanroy/GEITje-7B-ultra-GGUFmodel· 219 dl· ♡ 10219 dl♡ 10
- 🤗BramVanroy/GEITje-7B-ultra-sftmodel· 21 dl· ♡ 521 dl♡ 5
- 🤗BramVanroy/GEITje-7B-ultramodel· 447 dl· ♡ 53447 dl♡ 53
- 🤗RichardErkhov/BramVanroy_-_GEITje-7B-ultra-8bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/BramVanroy_-_GEITje-7B-ultra-awqmodel
- 🤗tostideluxekaas/GEITje-7b-uncensoredmodel
- 🤗tostideluxekaas/GEITje-7b-uncensored-GGUFmodel· 56 dl56 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
