BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang,, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei, Xie, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

TL;DR
BACON introduces a structured captioning method that dissects image descriptions into key elements, improving clarity and enabling models to perform better on tasks like object detection without additional training.
Contribution
We propose BACON, a novel prompting technique that converts dense captions into structured JSON format, enhancing interpretability and transferability for vision-language models.
Findings
BACON-style captions improve model clarity and task performance.
Our dataset of 100,000 annotated captions enhances training.
GroundingDINO achieves 1.51x higher recall with BACON captions.
Abstract
Advancements in large Vision-Language Models have brought precise, accurate image captioning, vital for advancing multi-modal image understanding and processing. Yet these captions often carry lengthy, intertwined contexts that are difficult to parse and frequently overlook essential cues, posing a great barrier for models like GroundingDINO and SDXL, which lack the strong text encoding and syntax analysis needed to fully leverage dense captions. To address this, we propose BACON, a prompting method that breaks down VLM-generated captions into disentangled, structured elements such as objects, relationships, styles, and themes. This approach not only minimizes confusion from handling complex contexts but also allows for efficient transfer into a JSON dictionary, enabling models without linguistic processing capabilities to easily access key information. We annotated 100,000…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychedelics and Drug Studies · Mental Health Research Topics · Data Stream Mining Techniques
