Robust Image Captioning
Daniel Yarnell, Xian Wang

TL;DR
This paper introduces a robust image captioning method that uses object relation graphs and adversarial training to improve attention mechanisms, demonstrating promising experimental results.
Contribution
It proposes a novel approach combining object relation graphs with adversarial robust cut algorithms for enhanced image captioning.
Findings
Demonstrates improved captioning performance on benchmark datasets.
Shows robustness of the method against adversarial perturbations.
Highlights the importance of spatial object relations in captioning accuracy.
Abstract
Automated captioning of photos is a mission that incorporates the difficulties of photo analysis and text generation. One essential feature of captioning is the concept of attention: how to determine what to specify and in which sequence. In this study, we leverage the Object Relation using adversarial robust cut algorithm, that grows upon this method by specifically embedding knowledge about the spatial association between input data through graph representation. Our experimental study represent the promising performance of our proposed method for image captioning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
