#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila

arXiv:2103.11474·cs.CV·April 21, 2026

#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

Gabriel Oliveira dos Santos, Esther Luna Colombini, Sandra Avila

PDF

1 Repo

TL;DR

The paper introduces #PraCegoVer, a large Portuguese image captioning dataset derived from social media, addressing language scarcity and presenting unique linguistic challenges for image captioning models.

Contribution

It is the first large, freely annotated Portuguese dataset for image captioning, inspired by social media posts, with unique linguistic and annotation characteristics.

Findings

01

Dataset contains only one reference caption per image.

02

Caption lengths vary more than in existing datasets.

03

Provides a new resource for multilingual image captioning research.

Abstract

Automatically describing images using natural sentences is an important task to support visually impaired people's inclusion onto the Internet. It is still a big challenge that requires understanding the relation of the objects present in the image and their attributes and actions they are involved in. Then, visual interpretation methods are needed, but linguistic models are also necessary to verbally describe the semantic relations. This problem is known as Image Captioning. Although many datasets were proposed in the literature, the majority contains only English captions, whereas datasets with captions described in other languages are scarce. Recently, a movement called PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Thus, inspired by this movement, we have proposed the #PraCegoVer,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabrielsantosrv/PraCegoVer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.