Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model
M. Saquib Sarfraz, Arne Schumann, Yan Wang, Rainer Stiefelhagen

TL;DR
This paper introduces an end-to-end view-sensitive model for pedestrian attribute inference, jointly predicting pedestrian pose and attributes, leading to improved accuracy in challenging surveillance datasets.
Contribution
It proposes a novel end-to-end framework that jointly predicts pedestrian view and attributes, enhancing attribute inference accuracy over existing methods.
Findings
Improved attribute prediction performance on PETA, RAP, and WIDER datasets.
View-sensitive model outperforms state-of-the-art methods.
Joint pose and attribute prediction benefits from shared learning.
Abstract
Pedestrian attribute inference is a demanding problem in visual surveillance that can facilitate person retrieval, search and indexing. To exploit semantic relations between attributes, recent research treats it as a multi-label image classification task. The visual cues hinting at attributes can be strongly localized and inference of person attributes such as hair, backpack, shorts, etc., are highly dependent on the acquired view of the pedestrian. In this paper we assert this dependence in an end-to-end learning framework and show that a view-sensitive attribute inference is able to learn better attribute predictions. Our proposed model jointly predicts the coarse pose (view) of the pedestrian and learns specialized view-specific multi-label attribute predictions. We show in an extensive evaluation on three challenging datasets (PETA, RAP and WIDER) that our proposed end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
