VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training
Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos,, Ioannis Kompatsiaris

TL;DR
VICTOR is a transformer-based model that detects visual incompatibility in fashion outfits, using contrastive pre-training and a new dataset, achieving high accuracy while significantly reducing computational costs.
Contribution
The paper introduces VICTOR, a novel transformer architecture for fashion compatibility detection, and a new dataset Polyvore-MISFITs, improving accuracy and efficiency over existing methods.
Findings
VICTOR surpasses state-of-the-art on Polyvore datasets.
Reduces floating operations by 88% compared to previous models.
Effective in both overall compatibility regression and item mismatch detection.
Abstract
For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
