Scale Coding Bag of Deep Features for Human Attribute and Action   Recognition

Fahad Shahbaz Khan; Joost van de Weijer; Rao Muhammad Anwer; Andrew D.; Bagdanov; Michael Felsberg; Jorma Laaksonen

arXiv:1612.04884·cs.CV·March 28, 2018

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

Fahad Shahbaz Khan, Joost van de Weijer, Rao Muhammad Anwer, Andrew D., Bagdanov, Michael Felsberg, Jorma Laaksonen

PDF

TL;DR

This paper introduces scale coding techniques within a Bag of Deep Features framework to explicitly encode multi-scale information, improving human attribute and action recognition accuracy across multiple datasets.

Contribution

It proposes two novel scale coding strategies that explicitly incorporate multi-scale features into deep image representations, outperforming scale-invariant methods.

Findings

01

Outperforms scale-invariant approaches on five datasets

02

Improves recognition accuracy by explicitly encoding scale information

03

Combining scale coding with standard deep features yields state-of-the-art results

Abstract

Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a Bag of Deep Features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.