Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Naoya Sogi, Takashi Shibata, Makoto Terao

TL;DR
This paper introduces an object-aware query perturbation method that enhances cross-modal image-text retrieval by focusing on small objects, aligning model attention more closely with human cognition without additional fine-tuning.
Contribution
It proposes a novel object-aware query perturbation framework that improves small object retrieval in V extless{}>{}L models without extra fine-tuning.
Findings
Outperforms existing algorithms on four datasets.
Enhances small object retrieval accuracy.
Maintains original model performance without fine-tuning.
Abstract
The pre-trained vision and language (V\&L) models have substantially improved the performance of cross-modal image-text retrieval. In general, however, V\&L models have limited retrieval performance for small objects because of the rough alignment between words and the small objects in the image. In contrast, it is known that human cognition is object-centric, and we pay more attention to important objects, even if they are small. To bridge this gap between the human cognition and the V\&L model's capability, we propose a cross-modal image-text retrieval framework based on ``object-aware query perturbation.'' The proposed method generates a key feature subspace of the detected objects and perturbs the corresponding queries using this subspace to improve the object awareness in the image. In our proposed method, object-aware cross-modal image-text retrieval is possible while keeping the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Data Management and Algorithms
MethodsSoftmax · Attention Is All You Need
