
TL;DR
LuxMT is a fine-tuned machine translation system for Luxembourgish into French and English, utilizing a novel benchmark and data filtering techniques, showing significant improvements over baseline models and exploring new quality estimation methods.
Contribution
The paper introduces LuxMT, a specialized translation system for Luxembourgish, along with a new benchmark dataset and a novel data filtering approach using sentence embeddings.
Findings
LuxMT outperforms Gemma 3 baseline in Luxembourgish translation.
LuxMT achieves notable results even for unseen target languages like German.
LuxEmbedder correlates strongly with existing quality metrics, indicating potential as a quality estimation tool.
Abstract
We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering LB-FR, LB-EN, and LB-FR using human-translated data from Luci, a tourist magazine about Luxembourg. Training data stems from LuxAlign, a parallel corpus of multilingual Luxembourgish news articles, and LB parliamentary transcripts augmented with Google Translate. We filter the data using LuxEmbedder, LB sentence embeddings, to remove low-equivalence segment-pairs. Overall, LuxMT's results suggest strong improvements over the Gemma 3 baseline, even for translating LB to German (DE), despite the training data not containing any DE. We also explore LuxEmbedder's potential to be used as a quality estimation metric and find strong correlations with other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Text Readability and Simplification
