Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
Yiming Wu, Hangfei Li, Fangfang Wang, Yilong Zhang, Ronghua Liang

TL;DR
This paper introduces a Self-distilled Dynamic Fusion Network that adaptively combines image and text features for improved language-based fashion retrieval, addressing the rigidity of static fusion methods.
Contribution
The paper proposes a novel dynamic fusion architecture with modality-specific routers and a self path distillation loss, enhancing flexibility and accuracy in multimodal feature integration.
Findings
Outperforms existing methods on fashion retrieval benchmarks.
Demonstrates improved feature routing stability and accuracy.
Effectively refines path decisions through self-distillation.
Abstract
In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In response, we propose a Self-distilled Dynamic Fusion Network to compose the multi-granularity features dynamically by considering the consistency of routing path and modality-specific information simultaneously. Two new modules are included in our proposed method: (1) Dynamic Fusion Network with Modality Specific Routers. The dynamic network enables a flexible determination of the routing for each reference image and modification text, taking into account their distinct semantics and distributions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computational and Text Analysis Methods
