hauWE: Hausa Words Embedding for Natural Language Processing

Idris Abdulmumin; Bashir Shehu Galadanci

arXiv:1911.10708·cs.CL·January 8, 2020

hauWE: Hausa Words Embedding for Natural Language Processing

Idris Abdulmumin, Bashir Shehu Galadanci

PDF

TL;DR

This paper introduces hauWE, new Hausa word embedding models using Word2Vec's CBoW and Skip Gram, which outperform previous fastText embeddings in predicting similar words, thus enhancing NLP applications for Hausa.

Contribution

The paper presents larger and more effective Hausa word embedding models using Word2Vec, improving upon the only existing Hausa embeddings based on fastText.

Findings

01

hauWE CBoW achieves 88.7% accuracy

02

hauWE SG achieves 79.3% accuracy

03

outperforms previous fastText-based model with 22.3% accuracy

Abstract

Words embedding (distributed word vector representations) have become an essential component of many natural language processing (NLP) tasks such as machine translation, sentiment analysis, word analogy, named entity recognition and word similarity. Despite this, the only work that provides word vectors for Hausa language is that of Bojanowski et al. [1] trained using fastText, consisting of only a few words vectors. This work presents words embedding models using Word2Vec's Continuous Bag of Words (CBoW) and Skip Gram (SG) models. The models, hauWE (Hausa Words Embedding), are bigger and better than the only previous model, making them more useful in NLP tasks. To compare the models, they were used to predict the 10 most similar words to 30 randomly selected Hausa words. hauWE CBoW's 88.7% and hauWE SG's 79.3% prediction accuracy greatly outperformed Bojanowski et al. [1]'s 22.3%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsfastText