Distinctive-attribute Extraction for Image Captioning
Boeun Kim, Young Han Lee, Hyedong Jung, Choongsang Cho

TL;DR
This paper introduces a distinctive-attribute extraction method for image captioning that leverages TF-IDF analysis to improve caption accuracy and detail by emphasizing significant semantic attributes.
Contribution
It proposes a novel approach that explicitly extracts distinctive attributes using TF-IDF to enhance image captioning performance.
Findings
Improved caption detail and accuracy on challenge datasets.
Explicit attribute extraction enhances semantic relevance.
Method outperforms baseline models in objective metrics.
Abstract
Image captioning, an open research issue, has been evolved with the progress of deep neural networks. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are employed to compute image features and generate natural language descriptions in the research. In previous works, a caption involving semantic description can be generated by applying additional information into the RNNs. In this approach, we propose a distinctive-attribute extraction (DaE) which explicitly encourages significant meanings to generate an accurate caption describing the overall meaning of the image with their unique situation. Specifically, the captions of training images are analyzed by term frequency-inverse document frequency (TF-IDF), and the analyzed semantic information is trained to extract distinctive-attributes for inferring captions. The proposed scheme is evaluated on a challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
