Unimodal vs. Multimodal Siamese Networks for Outfit Completion
Mariya Hendriksen, Viggo Overes

TL;DR
This paper investigates the effectiveness of unimodal versus multimodal Siamese networks in predicting missing items in fashion outfits, demonstrating that combining visual and textual data improves performance in the Fill in the Blank task.
Contribution
It explores how multimodal data integration in Siamese networks enhances outfit completion accuracy compared to unimodal approaches.
Findings
Multimodal Siamese networks outperform unimodal models.
Combining visual and textual data yields promising results.
The approach advances outfit recommendation accuracy.
Abstract
The popularity of online fashion shopping continues to grow. The ability to offer an effective recommendation to customers is becoming increasingly important. In this work, we focus on Fashion Outfits Challenge, part of SIGIR 2022 Workshop on eCommerce. The challenge is centered around Fill in the Blank (FITB) task that implies predicting the missing outfit, given an incomplete outfit and a list of candidates. In this paper, we focus on applying siamese networks on the task. More specifically, we explore how combining information from multiple modalities (textual and visual modality) impacts the performance of the model on the task. We evaluate our model on the test split provided by the challenge organizers and the test split with gold assignments that we created during the development phase. We discover that using both visual, and visual and textual data demonstrates promising results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsTest
