BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid   Counterfactual Training for Robust Content-based Image Retrieval

Wenqiao Zhang; Jiannan Guo; Mengze Li; Haochen Shi; Shengyu Zhang,; Juncheng Li; Siliang Tang; Yueting Zhuang

arXiv:2207.04211·cs.AI·July 12, 2022·5 cites

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang,, Juncheng Li, Siliang Tang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces BOSS, a novel approach for content-based image retrieval that combines bottom-up cross-modal semantic composition with hybrid counterfactual training to improve understanding and reduce ambiguity.

Contribution

The paper proposes a new bottom-up cross-modal semantic composition framework with hybrid counterfactual training, addressing overlooked aspects of image-text representation in CIR tasks.

Findings

01

Improved retrieval accuracy demonstrated on benchmark datasets.

02

Effective reduction of ambiguity in similar queries.

03

Enhanced understanding of cross-modal semantic interactions.

Abstract

Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text, which potentially impacts a wide variety of real-world applications, such as internet search and fashion retrieval. In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image. This task is challenging since it necessitates learning and understanding the composite image-text representation by incorporating cross-granular semantic updates. In this paper, we tackle this task by a novel \underline{\textbf{B}}ottom-up cr\underline{\textbf{O}}ss-modal \underline{\textbf{S}}emantic compo\underline{\textbf{S}}ition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques