Contextual Action Recognition with R*CNN

Georgia Gkioxari; Ross Girshick; Jitendra Malik

arXiv:1505.01197·cs.CV·March 28, 2016·72 cites

Contextual Action Recognition with R*CNN

Georgia Gkioxari, Ross Girshick, Jitendra Malik

PDF

Open Access 2 Repos

TL;DR

This paper introduces R*CNN, a novel action recognition system that leverages multiple contextual cues from images, achieving high accuracy and demonstrating versatility in related tasks like attribute classification.

Contribution

The paper proposes R*CNN, an adaptation of RCNN that uses multiple regions for improved action recognition and can be applied to fine-grained attribute classification.

Findings

01

Achieves 90.2% mean AP on PASAL VOC Action dataset.

02

Outperforms previous methods significantly in action recognition.

03

Attains state-of-the-art results on Berkeley Attributes of People dataset.

Abstract

There are multiple cues in an image which reveal what action a person is performing. For example, a jogger has a pose that is characteristic for jogging, but the scene (e.g. road, trail) and the presence of other joggers can be an additional source of information. In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system. We adapt RCNN to use more than one region for classification while still maintaining the ability to localize the action. We call our system R*CNN. The action-specific models and the feature maps are trained jointly, allowing for action specific representations to emerge. R*CNN achieves 90.2% mean AP on the PASAL VOC Action dataset, outperforming all other approaches in the field by a significant margin. Last, we show that R*CNN is not limited to action recognition. In particular, R*CNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Multimodal Machine Learning Applications