Image captioning in different languages

Emiel van Miltenburg

arXiv:2407.09495·cs.CL·April 4, 2025

Image captioning in different languages

Emiel van Miltenburg

PDF

Open Access

TL;DR

This paper highlights the scarcity of non-English image captioning datasets, providing a curated list and discussing the limited language diversity compared to the vast number of existing languages.

Contribution

It offers a curated list of non-English image captioning datasets and discusses the significant gap in language coverage in the field.

Findings

01

Only 23 languages are represented in existing datasets.

02

The Crossmodal-3600 dataset covers 36 languages, increasing diversity.

03

Majority of languages lack dedicated image captioning datasets.

Abstract

This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the dearth of datasets in different languages: only 23 different languages are represented. With the addition of the Crossmodal-3600 dataset (Thapliyal et al., 2022, 36 languages) this number increases somewhat, but still this number is small compared to the +/-500 institutional languages that are out there. This paper closes with some open questions for the field of Vision & Language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Multimodal Machine Learning Applications · Translation Studies and Practices