Towards reporting bias in visual-language datasets: bimodal augmentation   by decoupling object-attribute association

Qiyu Wu; Mengjie Zhao; Yutong He; Lang Huang; Junya Ono; Hiromi; Wakaki; Yuki Mitsufuji

arXiv:2310.01330·cs.CV·October 3, 2023

Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association

Qiyu Wu, Mengjie Zhao, Yutong He, Lang Huang, Junya Ono, Hiromi, Wakaki, Yuki Mitsufuji

PDF

Open Access

TL;DR

This paper introduces BiAug, a bimodal augmentation method that uses object-attribute decoupling and synthetic data generation to reduce reporting bias in visual-language datasets, enhancing model understanding and zero-shot retrieval performance.

Contribution

The paper presents a novel bimodal augmentation approach leveraging LLMs and inpainting to explicitly address reporting bias in object-attribute associations within visual-language datasets.

Findings

01

BiAug improves object-attribute understanding.

02

BiAug enhances zero-shot retrieval on MSCOCO and Flickr30K.

03

Mitigating reporting bias leads to richer visual-language models.

Abstract

Reporting bias arises when people assume that some knowledge is universally understood and hence, do not necessitate explicit elaboration. In this paper, we focus on the wide existence of reporting bias in visual-language datasets, embodied as the object-attribute association, which can subsequentially degrade models trained on them. To mitigate this bias, we propose a bimodal augmentation (BiAug) approach through object-attribute decoupling to flexibly synthesize visual-language examples with a rich array of object-attribute pairing and construct cross-modal hard negatives. We employ large language models (LLMs) in conjunction with a grounding object detector to extract target objects. Subsequently, the LLM generates a detailed attribute description for each object and produces a corresponding hard negative counterpart. An inpainting model is then used to create images based on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques

MethodsFocus · Inpainting