Image captioning in different languages
Emiel van Miltenburg

TL;DR
This paper highlights the scarcity of non-English image captioning datasets, providing a curated list and discussing the limited language diversity compared to the vast number of existing languages.
Contribution
It offers a curated list of non-English image captioning datasets and discusses the significant gap in language coverage in the field.
Findings
Only 23 languages are represented in existing datasets.
The Crossmodal-3600 dataset covers 36 languages, increasing diversity.
Majority of languages lack dedicated image captioning datasets.
Abstract
This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the dearth of datasets in different languages: only 23 different languages are represented. With the addition of the Crossmodal-3600 dataset (Thapliyal et al., 2022, 36 languages) this number increases somewhat, but still this number is small compared to the +/-500 institutional languages that are out there. This paper closes with some open questions for the field of Vision & Language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Multimodal Machine Learning Applications · Translation Studies and Practices
