FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
Dehong Gao, Linbo Jin, Ben Chen, Minghui Qiu, Peng Li, Yi Wei, Yi Hu, and Hao Wang

TL;DR
FashionBERT introduces a novel approach for fashion cross-modal retrieval by leveraging image patches and an adaptive loss, significantly improving matching accuracy over existing methods.
Contribution
It proposes FashionBERT, a model that uses patches as image features and an adaptive loss to enhance fine-grained fashion text-image matching.
Findings
Achieves significant performance improvements over baselines.
Demonstrates effectiveness of patch-based image features.
Provides detailed analysis of matching performance and efficiency.
Abstract
In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsLinear Layer · Adaptive Robust Loss · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay
