FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing
Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt

TL;DR
FASH-iCNN is a multimodal system trained on Vogue images that makes the cultural logic of fashion houses, eras, and color traditions inspectable, revealing which visual features encode editorial identity.
Contribution
It introduces a system that not only predicts fashion attributes but also makes the encoding of cultural logic transparent and interpretable.
Findings
Clothing-only model achieves 78.2% top-1 accuracy for fashion houses.
Texture and luminance are primary carriers of editorial identity.
Removing color minimally impacts house identification accuracy.
Abstract
Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
