A Better Baseline for AVA

Rohit Girdhar; Jo\~ao Carreira; Carl Doersch; Andrew Zisserman

arXiv:1807.10066·cs.CV·July 27, 2018·44 cites

A Better Baseline for AVA

Rohit Girdhar, Jo\~ao Carreira, Carl Doersch, Andrew Zisserman

PDF

Open Access

TL;DR

This paper presents a simple yet effective spatiotemporal action localization baseline for AVA, significantly outperforming previous models by leveraging I3D features within a Faster R-CNN framework.

Contribution

The authors introduce a new baseline for AVA action localization using I3D features with Faster R-CNN, achieving state-of-the-art results at CVPR 2018.

Findings

01

Achieved 21.9% average AP on AVA v2.1 validation set.

02

Outperformed previous models and challenge submissions.

03

Demonstrated the effectiveness of I3D features for action localization.

Abstract

We introduce a simple baseline for action localization on the AVA dataset. The model builds upon the Faster R-CNN bounding box detection framework, adapted to operate on pure spatiotemporal features - in our case produced exclusively by an I3D model pretrained on Kinetics. This model obtains 21.9% average AP on the validation set of AVA v2.1, up from 14.5% for the best RGB spatiotemporal model used in the original AVA paper (which was pretrained on Kinetics and ImageNet), and up from 11.3 of the publicly available baseline using a ResNet101 image feature extractor, that was pretrained on ImageNet. Our final model obtains 22.8%/21.9% mAP on the val/test sets and outperforms all submissions to the AVA challenge at CVPR 2018.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsRegion Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN