TL;DR
The paper introduces #PraCegoVer, a large Portuguese image captioning dataset derived from social media, addressing language scarcity and presenting unique linguistic challenges for image captioning models.
Contribution
It is the first large, freely annotated Portuguese dataset for image captioning, inspired by social media posts, with unique linguistic and annotation characteristics.
Findings
Dataset contains only one reference caption per image.
Caption lengths vary more than in existing datasets.
Provides a new resource for multilingual image captioning research.
Abstract
Automatically describing images using natural sentences is an important task to support visually impaired people's inclusion onto the Internet. It is still a big challenge that requires understanding the relation of the objects present in the image and their attributes and actions they are involved in. Then, visual interpretation methods are needed, but linguistic models are also necessary to verbally describe the semantic relations. This problem is known as Image Captioning. Although many datasets were proposed in the literature, the majority contains only English captions, whereas datasets with captions described in other languages are scarce. Recently, a movement called PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Thus, inspired by this movement, we have proposed the #PraCegoVer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
