TL;DR
This paper develops and evaluates various semantic models for Amharic, demonstrating that monolingually trained models, especially those based on contextual embeddings, outperform existing multilingual models across multiple NLP tasks.
Contribution
The study introduces nine new Amharic semantic models trained on monolingual data, improving performance over existing multilingual models and providing insights into effective embedding techniques.
Findings
New models outperform pre-trained multilingual models.
Contextual embeddings from RoBERTa outperform word2Vec models.
Monolingual training enhances model performance for Amharic.
Abstract
The availability of different pre-trained semantic models enabled the quick development of machine learning components for downstream applications. Despite the availability of abundant text data for low resource languages, only a few semantic models are publicly available. Publicly available pre-trained models are usually built as a multilingual version of semantic models that can not fit well for each language due to context variations. In this work, we introduce different semantic models for Amharic. After we experiment with the existing pre-trained semantic models, we trained and fine-tuned nine new different models using a monolingual text corpus. The models are build using word2Vec embeddings, distributional thesaurus (DT), contextual embeddings, and DT embeddings obtained via network embedding algorithms. Moreover, we employ these models for different NLP tasks and investigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Adam · Residual Connection · Dropout
