Rich Image Captioning in the Wild

Kenneth Tran; Xiaodong He; Lei Zhang; Jian Sun; Cornelia Carapcea,; Chris Thrasher; Chris Buehler; Chris Sienkiewicz

arXiv:1603.09016·cs.CV·April 1, 2016·41 cites

Rich Image Captioning in the Wild

Kenneth Tran, Xiaodong He, Lei Zhang, Jian Sun, Cornelia Carapcea,, Chris Thrasher, Chris Buehler, Chris Sienkiewicz

PDF

Open Access 1 Video

TL;DR

This paper introduces a comprehensive image captioning system capable of generating high-quality, human-like descriptions for images in diverse, real-world scenarios, emphasizing out-of-domain robustness and low latency.

Contribution

It presents a novel deep vision and entity recognition framework that improves caption quality and handles out-of-domain data effectively.

Findings

01

Outperforms previous state-of-the-art on MS COCO

02

Effective in out-of-domain datasets

03

Achieves low latency in caption generation

Abstract

We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rich Image Captioning In The Wild· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition