STAIR Captions: Constructing a Large-Scale Japanese Image Caption   Dataset

Yuya Yoshikawa; Yutaro Shigeto; Akikazu Takeuchi

arXiv:1705.00823·cs.CL·May 3, 2017·21 cites

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces STAIR Captions, a large-scale Japanese image caption dataset derived from MS-COCO, enabling improved Japanese image captioning through neural network training.

Contribution

The creation of the first large-scale Japanese image caption dataset, STAIR Captions, with over 820,000 captions for 164,000 images, facilitating better Japanese image captioning models.

Findings

01

Neural networks trained on STAIR Captions produce more natural Japanese captions.

02

Models trained on STAIR Captions outperform translation-based methods.

03

The dataset significantly advances Japanese image captioning research.

Abstract

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

William-N-Havard/VGS-dataset-metadata
none

Datasets

shunk031/STAIR-Captions
dataset· 421 dl
421 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Natural Language Processing Techniques