OPCap:Object-aware Prompting Captioning

Feiyang Huang

arXiv:2412.00095·cs.CV·January 20, 2025

OPCap:Object-aware Prompting Captioning

Feiyang Huang

PDF

Open Access

TL;DR

OPCap introduces an object-aware prompting method that leverages object detection and attribute refinement to reduce hallucination and improve image captioning quality, validated on COCO and nocaps datasets.

Contribution

The paper presents a novel target-aware prompting strategy that integrates object labels and attributes to enhance captioning accuracy and reduce hallucination.

Findings

01

Significant reduction in object hallucination in captions.

02

Improved caption quality on COCO and nocaps datasets.

03

Effective integration of object detection and attribute prediction.

Abstract

In the field of image captioning, the phenomenon where missing or nonexistent objects are used to explain an image is referred to as object bias (or hallucination). To mitigate this issue, we propose a target-aware prompting strategy. This method first extracts object labels and their spatial information from the image using an object detector. Then, an attribute predictor further refines the semantic features of the objects. These refined features are subsequently integrated and fed into the decoder, enhancing the model's understanding of the image context. Experimental results on the COCO and nocaps datasets demonstrate that OPCap effectively mitigates hallucination and significantly improves the quality of generated captions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques