Robust Image Captioning

Daniel Yarnell; Xian Wang

arXiv:2012.09732·cs.CV·December 18, 2020·1 cites

Robust Image Captioning

Daniel Yarnell, Xian Wang

PDF

Open Access

TL;DR

This paper introduces a robust image captioning method that uses object relation graphs and adversarial training to improve attention mechanisms, demonstrating promising experimental results.

Contribution

It proposes a novel approach combining object relation graphs with adversarial robust cut algorithms for enhanced image captioning.

Findings

01

Demonstrates improved captioning performance on benchmark datasets.

02

Shows robustness of the method against adversarial perturbations.

03

Highlights the importance of spatial object relations in captioning accuracy.

Abstract

Automated captioning of photos is a mission that incorporates the difficulties of photo analysis and text generation. One essential feature of captioning is the concept of attention: how to determine what to specify and in which sequence. In this study, we leverage the Object Relation using adversarial robust cut algorithm, that grows upon this method by specifically embedding knowledge about the spatial association between input data through graph representation. Our experimental study represent the promising performance of our proposed method for image captioning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition