AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages
Tosin Adewumi, Mofetoluwa Adeyemi, Aremu Anuoluwapo, Bukola Peters,, Happy Buzaaba, Oyerinde Samuel, Amina Mardiyyah Rufai, Benjamin Ajibade,, Tajudeen Gwadabe, Mory Moussou Koulibaly Traore, Tunde Ajayi, Shamsuddeen, Muhammad, Ahmed Baruwa, Paul Owoicho, Tolulope Ogunremi

TL;DR
This paper introduces high-quality dialogue datasets for six African languages, investigates transfer learning with state-of-the-art models, and demonstrates promising cross-lingual transferability and human-like dialogue generation in low-resource African languages.
Contribution
It provides the first dialogue datasets for six African languages and analyzes the effectiveness of transfer learning with deep models for low-resource language dialogue generation.
Findings
Deep monolingual models learn abstractions that transfer across languages.
Five out of six languages show human-like conversational quality.
Nigerian Pidgin English exhibits the highest transferability with 78.1% human-likeness.
Abstract
Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
