Pose Guided Attention for Multi-label Fashion Image Classification

Beatriz Quintino Ferreira; Jo\~ao P. Costeira; Ricardo G. Sousa,; Liang-Yan Gui; Jo\~ao P. Gomes

arXiv:1911.05024·cs.CV·November 13, 2019

Pose Guided Attention for Multi-label Fashion Image Classification

Beatriz Quintino Ferreira, Jo\~ao P. Costeira, Ricardo G. Sousa,, Liang-Yan Gui, Jo\~ao P. Gomes

PDF

TL;DR

This paper introduces a pose-guided attention framework for multi-label fashion image classification, achieving superior or comparable results to state-of-the-art methods without relying on landmark annotations, and enhancing robustness and interpretability.

Contribution

The paper presents a novel visual semantic attention model supervised by automatic pose extraction, improving multi-label classification in fashion images without landmark annotations.

Findings

01

Outperforms state-of-the-art on an in-house dataset.

02

Performs on par with previous methods on DeepFashion without landmark annotations.

03

Enhances robustness to incorrect annotations and improves interpretability.

Abstract

We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.