TL;DR
This paper introduces a new camouflaged image-text retrieval task, creates a dedicated dataset, and proposes a collaborative network with a novel attention mechanism to improve retrieval accuracy in camouflaged scenarios.
Contribution
It formulates the first camouflaged image-text retrieval task, constructs a specialized dataset, and develops a novel collaborative network with confidence-conditioned graph attention.
Findings
CECNet achieves approximately 29% improvement in overall CA-ITR accuracy.
Benchmark results highlight the challenges posed by camouflage properties.
The proposed method surpasses seven existing retrieval models.
Abstract
Camouflaged scene understanding (CSU) has attracted significant attention due to its broad practical implications. However, in this field, robust image-text cross-modal alignment remains under-explored, hindering deeper understanding of camouflaged scenarios and their related applications. To this end, we focus on the typical image-text retrieval task, and formulate a new task dubbed ``camouflage-aware image-text retrieval'' (CA-ITR). We first construct a dedicated camouflage image-text retrieval dataset (CamoIT), comprising 10.5K samples with multi-granularity textual annotations. Benchmark results conducted on CamoIT reveal the underlying challenges of CA-ITR for existing cutting-edge retrieval techniques, which are mainly caused by objects' camouflage properties as well as those complex image contents. As a solution, we propose a camouflage-expert collaborative network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
