Distinctive-attribute Extraction for Image Captioning

Boeun Kim; Young Han Lee; Hyedong Jung; Choongsang Cho

arXiv:1807.09434·cs.CV·July 26, 2018

Distinctive-attribute Extraction for Image Captioning

Boeun Kim, Young Han Lee, Hyedong Jung, Choongsang Cho

PDF

Open Access

TL;DR

This paper introduces a distinctive-attribute extraction method for image captioning that leverages TF-IDF analysis to improve caption accuracy and detail by emphasizing significant semantic attributes.

Contribution

It proposes a novel approach that explicitly extracts distinctive attributes using TF-IDF to enhance image captioning performance.

Findings

01

Improved caption detail and accuracy on challenge datasets.

02

Explicit attribute extraction enhances semantic relevance.

03

Method outperforms baseline models in objective metrics.

Abstract

Image captioning, an open research issue, has been evolved with the progress of deep neural networks. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are employed to compute image features and generate natural language descriptions in the research. In previous works, a caption involving semantic description can be generated by applying additional information into the RNNs. In this approach, we propose a distinctive-attribute extraction (DaE) which explicitly encourages significant meanings to generate an accurate caption describing the overall meaning of the image with their unique situation. Specifically, the captions of training images are analyzed by term frequency-inverse document frequency (TF-IDF), and the analyzed semantic information is trained to extract distinctive-attributes for inferring captions. The proposed scheme is evaluated on a challenge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition