AFRICAPTION: Establishing a New Paradigm for Image Captioning in African Languages

Mardiyyah Oduwole; Prince Mireku; Fatimo Adebanjo; Oluwatosin Olajide; Mahi Aminu Aliyu; Jekaterina Novikova

arXiv:2510.17405·cs.CL·October 21, 2025

AFRICAPTION: Establishing a New Paradigm for Image Captioning in African Languages

Mardiyyah Oduwole, Prince Mireku, Fatimo Adebanjo, Oluwatosin Olajide, Mahi Aminu Aliyu, Jekaterina Novikova

PDF

Open Access 1 Video

TL;DR

This paper introduces AfriCaption, a scalable, multilingual image captioning framework for 20 African languages, including a new dataset, a dynamic pipeline, and a large vision-to-text model to promote inclusive AI.

Contribution

It presents the first comprehensive, scalable image captioning resource and model for under-represented African languages, addressing resource scarcity and inclusivity in multimodal AI.

Findings

01

Curated a new dataset with semantically aligned captions in 20 African languages.

02

Developed a dynamic, quality-preserving captioning pipeline.

03

Built a 0.5B parameter vision-to-text model integrating SigLIP and NLLB200.

Abstract

Multimodal AI research has overwhelmingly focused on high-resource languages, hindering the democratization of advancements in the field. To address this, we present AfriCaption, a comprehensive framework for multilingual image captioning in 20 African languages and our contributions are threefold: (i) a curated dataset built on Flickr8k, featuring semantically aligned captions generated via a context-aware selection and translation process; (ii) a dynamic, context-preserving pipeline that ensures ongoing quality through model ensembling and adaptive substitution; and (iii) the AfriCaption model, a 0.5B parameter vision-to-text architecture that integrates SigLIP and NLLB200 for caption generation across under-represented languages. This unified framework ensures ongoing data quality and establishes the first scalable image-captioning resource for under-represented African languages,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AfriCaption: Establishing a New Paradigm for Image Captioning in African Languages· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Language, Metaphor, and Cognition