Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Omer Nacar, Anis Koubaa

TL;DR
This paper introduces a novel nested embedding learning framework for Arabic NLP, demonstrating that language-specific Matryoshka models significantly improve semantic similarity understanding over traditional approaches.
Contribution
The work develops a new Arabic nested embedding framework using Matryoshka learning, with datasets translated into Arabic for comprehensive evaluation, showing superior performance in semantic similarity tasks.
Findings
Matryoshka embedding models outperform traditional models by 20-25%.
Models effectively capture Arabic-specific semantic nuances.
Evaluation across multiple metrics confirms improved performance.
Abstract
This work presents a novel framework for training Arabic nested embedding models through Matryoshka Embedding Learning, leveraging multilingual, Arabic-specific, and English-based models, to highlight the power of nested embeddings models in various Arabic NLP downstream tasks. Our innovative contribution includes the translation of various sentence similarity datasets into Arabic, enabling a comprehensive evaluation framework to compare these models across different dimensions. We trained several nested embedding models on the Arabic Natural Language Inference triplet dataset and assessed their performance using multiple evaluation metrics, including Pearson and Spearman correlations for cosine similarity, Manhattan distance, Euclidean distance, and dot product similarity. The results demonstrate the superior performance of the Matryoshka embedding models, particularly in capturing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshkamodel· 215 dl· ♡ 4215 dl♡ 4
- 🤗Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-tripletmodel· 247 dl· ♡ 10247 dl♡ 10
- 🤗Omartificial-Intelligence-Space/Arabic-labse-Matryoshkamodel· 511 dl· ♡ 5511 dl♡ 5
- 🤗Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshkamodel· 317 dl· ♡ 11317 dl♡ 11
- 🤗Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshkamodel· 179 dl· ♡ 1179 dl♡ 1
- Omartificial-Intelligence-Space/Arabic-NLi-Tripletdataset· 53 dl53 dl
- Omartificial-Intelligence-Space/Arabic-stsbdataset· 190 dl190 dl
- Omartificial-Intelligence-Space/Arabic-NLi-Pair-Classdataset· 134 dl134 dl
- Omartificial-Intelligence-Space/Arabic-NLi-Pair-Scoredataset· 24 dl24 dl
- Omartificial-Intelligence-Space/Arabic-NLi-Pairdataset· 27 dl27 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
