Automated Image Captioning for Rapid Prototyping and Resource   Constrained Environments

Karan Sharma; Arun CS Kumar; Suchendra Bhandarkar

arXiv:1606.01393·cs.CV·June 7, 2016

Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments

Karan Sharma, Arun CS Kumar, Suchendra Bhandarkar

PDF

Open Access

TL;DR

This paper proposes a scalable, resource-efficient approach to automated image captioning that leverages top object detection and word embeddings to infer actions, suitable for resource-constrained environments.

Contribution

It introduces a novel insight that detecting key objects enables relevant captioning and action inference, emphasizing simplicity and scalability in image captioning systems.

Findings

01

Effective action prediction using object detection and word embeddings

02

Achieved reasonable captioning performance with low complexity

03

Reduced system development time and resource requirements

Abstract

Significant performance gains in deep learning coupled with the exponential growth of image and video data on the Internet have resulted in the recent emergence of automated image captioning systems. Ensuring scalability of automated image captioning systems with respect to the ever increasing volume of image and video data is a significant challenge. This paper provides a valuable insight in that the detection of a few significant (top) objects in an image allows one to extract other relevant information such as actions (verbs) in the image. We expect this insight to be useful in the design of scalable image captioning systems. We address two parameters by which the scalability of image captioning systems could be quantified, i.e., the traditional algorithmic time complexity which is important given the resource limitations of the user device and the system development time since the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition