Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh, Gupta, Piotr Dollar, C. Lawrence Zitnick

TL;DR
This paper introduces the Microsoft COCO Caption dataset with over 330,000 images and 1.5 million captions, along with an evaluation server that scores caption quality using multiple metrics to standardize automatic captioning evaluation.
Contribution
It presents a large-scale caption dataset and an evaluation server, enabling consistent benchmarking of image captioning algorithms.
Findings
Dataset contains over 330,000 images with 1.5 million captions.
Evaluation server supports multiple metrics like BLEU, METEOR, ROUGE, CIDEr.
Provides standardized platform for captioning algorithm assessment.
Abstract
In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BridgeTower/bridgetower-basemodel· 3.5k dl· ♡ 103.5k dl♡ 10
- 🤗BridgeTower/bridgetower-large-itm-mlmmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗BridgeTower/bridgetower-base-itm-mlmmodel· 3.6k dl· ♡ 33.6k dl♡ 3
- 🤗BridgeTower/bridgetower-large-itm-mlm-gaudimodel· 11 dl· ♡ 211 dl♡ 2
- 🤗BridgeTower/bridgetower-large-itm-mlm-itcmodel· 124k dl· ♡ 12124k dl♡ 12
- 🤗kp-forks/bridgetower-large-itm-mlm-itcmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsMicrosoft Support 1-855-535-7109: Unlocking Solutions for Your Tech Needs
