Pose Guided Attention for Multi-label Fashion Image Classification
Beatriz Quintino Ferreira, Jo\~ao P. Costeira, Ricardo G. Sousa,, Liang-Yan Gui, Jo\~ao P. Gomes

TL;DR
This paper introduces a pose-guided attention framework for multi-label fashion image classification, achieving superior or comparable results to state-of-the-art methods without relying on landmark annotations, and enhancing robustness and interpretability.
Contribution
The paper presents a novel visual semantic attention model supervised by automatic pose extraction, improving multi-label classification in fashion images without landmark annotations.
Findings
Outperforms state-of-the-art on an in-house dataset.
Performs on par with previous methods on DeepFashion without landmark annotations.
Enhances robustness to incorrect annotations and improves interpretability.
Abstract
We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
