Visual Semantic Relatedness Dataset for Image Captioning

Ahmed Sabir; Francesc Moreno-Noguer; Llu\'is Padr\'o

arXiv:2301.08784·cs.CL·May 2, 2023

Visual Semantic Relatedness Dataset for Image Captioning

Ahmed Sabir, Francesc Moreno-Noguer, Llu\'is Padr\'o

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a new dataset that extends COCO Captions with scene-related textual information, enabling better integration of NLP techniques into image captioning systems.

Contribution

It provides a textual visual context dataset that enhances image captioning by incorporating scene information, bridging computer vision and NLP.

Findings

01

Enables use of NLP methods for captioning tasks

02

Facilitates end-to-end training or post-processing approaches

03

Improves semantic understanding in image captioning

Abstract

Modern image captioning system relies heavily on extracting knowledge from images to capture the concept of a static story. In this paper, we propose a textual visual context dataset for captioning, in which the publicly available dataset COCO Captions (Lin et al., 2014) has been extended with information about the scene (such as objects in the image). Since this information has a textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahmedssabir/textual-visual-semantic-dataset
tfOfficial

Models

🤗
AhmedSSabir/BERT-CNN-Visual-Semantic
model

Datasets

AhmedSSabir/Textual-Image-Caption-Dataset
dataset· 111 dl
111 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques