Improving Outfit Recommendation with Co-supervision of Fashion Generation
Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, Maarten de, Rijke

TL;DR
This paper introduces FARM, a neural co-supervision framework that enhances outfit recommendation by integrating aesthetic-aware generation and a novel matching mechanism, leading to improved recommendation accuracy and better feature encoding.
Contribution
FARM is the first framework to jointly optimize visual understanding and matching using generation supervision, significantly improving outfit recommendation performance.
Findings
FARM outperforms state-of-the-art models on public datasets.
It encodes better aesthetic features for improved recommendations.
Generated images serve as high-quality references enhancing recommendation accuracy.
Abstract
The task of fashion recommendation includes two main challenges: visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
