NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
Shu Zhao, Huijuan Xu

TL;DR
This paper introduces NEUCORE, a novel neural model for composed image retrieval that leverages fine-granularity multi-modal concept alignment and progressive fusion to improve accuracy in matching reference images and text modifiers.
Contribution
The paper proposes NEUCORE, a new model that enhances multi-modal understanding by focusing on concept-level alignment and progressive fusion, addressing limitations of previous holistic approaches.
Findings
Achieves state-of-the-art results on three datasets.
Effectively models concept-level interactions between images and text.
Improves retrieval accuracy by utilizing fine-grained multi-modal fusion.
Abstract
Composed image retrieval which combines a reference image and a text modifier to identify the desired target image is a challenging task, and requires the model to comprehend both vision and language modalities and their interactions. Existing approaches focus on holistic multi-modal interaction modeling, and ignore the composed and complimentary property between the reference image and text modifier. In order to better utilize the complementarity of multi-modal inputs for effective information fusion and retrieval, we move the multi-modal understanding to fine-granularity at concept-level, and learn the multi-modal concept alignment to identify the visual location in reference or target images corresponding to text modifier. Toward the end, we propose a NEUral COncept REasoning (NEUCORE) model which incorporates multi-modal concept alignment and progressive multimodal fusion over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsFocus
