FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval

Jeong-Woo Park; Young-Eun Kim; and Seong-Whan Lee

arXiv:2507.12823·cs.CV·July 18, 2025

FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval

Jeong-Woo Park, Young-Eun Kim, and Seong-Whan Lee

PDF

Open Access

TL;DR

FAR-Net is a multi-stage fusion framework for composed image retrieval that enhances semantic alignment and adaptive reconciliation between images and text, leading to improved retrieval accuracy.

Contribution

The paper introduces FAR-Net, combining two modules for better semantic alignment and robustness, advancing the fusion strategies in CIR tasks.

Findings

01

Improves Recall@1 by up to 2.4% on CIRR

02

Enhances Recall@50 by 1.04% on FashionIQ

03

Demonstrates robustness and scalability in CIR tasks

Abstract

Composed image retrieval (CIR) is a vision language task that retrieves a target image using a reference image and modification text, enabling intuitive specification of desired changes. While effectively fusing visual and textual modalities is crucial, existing methods typically adopt either early or late fusion. Early fusion tends to excessively focus on explicitly mentioned textual details and neglect visual context, whereas late fusion struggles to capture fine-grained semantic alignments between image regions and textual tokens. To address these issues, we propose FAR-Net, a multi-stage fusion framework designed with enhanced semantic alignment and adaptive reconciliation, integrating two complementary modules. The enhanced semantic alignment module (ESAM) employs late fusion with cross-attention to capture fine-grained semantic relationships, while the adaptive reconciliation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques