LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
Fred Philippy, Siwen Guo, Jacques Klein, Tegawend\'e F. Bissyand\'e

TL;DR
LuxEmbedder is a novel cross-lingual sentence embedding model for Luxembourgish that leverages a high-quality parallel dataset and demonstrates improved performance, while also creating a new benchmark for low-resource language NLP tasks.
Contribution
The paper introduces LuxEmbedder, a new sentence embedding model for Luxembourgish, and provides a high-quality parallel dataset along with a paraphrase detection benchmark for low-resource languages.
Findings
Including low-resource languages in training improves embeddings for similar languages.
LuxEmbedder outperforms previous models on Luxembourgish tasks.
A new benchmark facilitates future research in low-resource NLP.
Abstract
Sentence embedding models play a key role in various Natural Language Processing tasks, such as in Topic Modeling, Document Clustering and Recommendation Systems. However, these models rely heavily on parallel data, which can be scarce for many low-resource languages, including Luxembourgish. This scarcity results in suboptimal performance of monolingual and cross-lingual sentence embedding models for these languages. To address this issue, we compile a relatively small but high-quality human-generated cross-lingual parallel dataset to train LuxEmbedder, an enhanced sentence embedding model for Luxembourgish with strong cross-lingual capabilities. Additionally, we present evidence suggesting that including low-resource languages in parallel training datasets can be more advantageous for other low-resource languages than relying solely on high-resource language pairs. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
