FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal   Retrieval

Dehong Gao; Linbo Jin; Ben Chen; Minghui Qiu; Peng Li; Yi Wei; Yi Hu; and Hao Wang

arXiv:2005.09801·cs.IR·June 1, 2020·6 cites

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval

Dehong Gao, Linbo Jin, Ben Chen, Minghui Qiu, Peng Li, Yi Wei, Yi Hu, and Hao Wang

PDF

Open Access 3 Repos

TL;DR

FashionBERT introduces a novel approach for fashion cross-modal retrieval by leveraging image patches and an adaptive loss, significantly improving matching accuracy over existing methods.

Contribution

It proposes FashionBERT, a model that uses patches as image features and an adaptive loss to enhance fine-grained fashion text-image matching.

Findings

01

Achieves significant performance improvements over baselines.

02

Demonstrates effectiveness of patch-based image features.

03

Provides detailed analysis of matching performance and efficiency.

Abstract

In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsLinear Layer · Adaptive Robust Loss · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay