Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe

Mahounan Pericles Adjovi; Roald Eiselen; Prasenjit Mitra

arXiv:2604.12477·cs.CL·April 15, 2026

Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe

Mahounan Pericles Adjovi, Roald Eiselen, Prasenjit Mitra

PDF

1 Repo

TL;DR

This study compares prompting strategies for extracting low-resource language data from LLMs, focusing on Hausa and Fongbe, and releases the resulting corpora and code.

Contribution

It systematically evaluates elicitation strategies for two West African languages, revealing language-specific optimal prompting methods and providing publicly available data and tools.

Findings

01

GPT-4o Mini outperforms Gemini 2.5 in extracting usable words.

02

Optimal prompts vary: Hausa prefers functional text and dialogue.

03

Fongbe requires constrained generation prompts.

Abstract

Large language models (LLMs) are trained on data contributed by low-resource language communities, yet the linguistic knowledge encoded in these models remains accessible only through commercial APIs. This paper investigates whether strategic prompting can extract usable text data from LLMs for two West African languages: Hausa (Afroasiatic, approximately 80 million speakers) and Fongbe (Niger-Congo, approximately 2 million speakers). We systematically compare six elicitation task types across two commercial LLMs (GPT-4o Mini and Gemini 2.5 Flash). GPT-4o Mini extracts 6-41 times more usable target-language words per API call than Gemini. Optimal strategies differ by language: Hausa benefits from functional text and dialogue, while Fongbe requires constrained generation prompts. We release all generated corpora and code.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.