Image Captioning based on Deep Learning Methods: A Survey

Yiyu Wang; Jungang Xu; Yingfei Sun; Ben He

arXiv:1905.08110·cs.CV·May 21, 2019·6 cites

Image Captioning based on Deep Learning Methods: A Survey

Yiyu Wang, Jungang Xu, Yingfei Sun, Ben He

PDF

Open Access

TL;DR

This survey reviews recent deep learning techniques for image captioning, covering encoder-decoder architectures, improvements, and future research directions in the field.

Contribution

It provides a comprehensive overview of advancements in deep learning-based image captioning and discusses potential future research areas.

Findings

01

Deep learning has significantly advanced image captioning.

02

Encoder-decoder structures are central to current methods.

03

Future research should focus on improving accuracy and efficiency.

Abstract

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In this paper, we present a survey on advances in image captioning based on Deep Learning methods, including Encoder-Decoder structure, improved methods in Encoder, improved methods in Decoder, and other improvements. Furthermore, we discussed future research directions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition