HausaMT v1.0: Towards English-Hausa Neural Machine Translation
Adewale Akinfaderin

TL;DR
This paper presents HausaMT v1.0, a baseline neural machine translation model for English-Hausa, addressing low-resource language challenges through curated datasets and evaluation of different architectures and tokenization methods.
Contribution
It introduces a new baseline for English-Hausa translation with curated datasets and compares Recurrent and Transformer models using word-level and BPE tokenization.
Findings
Transformer outperforms Recurrent models
BPE tokenization improves translation quality
Baseline datasets enable future research
Abstract
Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English-Hausa machine translation, which is considered a task for low-resource language. The Hausa language is the second largest Afro-Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa-English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder-decoder architecture with two tokenization approaches: standard word-level tokenization and Byte Pair Encoding (BPE) subword…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding
