Recognizing Actions in Videos from Unseen Viewpoints
AJ Piergiovanni, Michael S. Ryoo

TL;DR
This paper addresses the challenge of recognizing actions in videos from unseen camera viewpoints by developing 3D-based methods and a new geometric convolutional layer to learn viewpoint-invariant features, supported by a novel dataset.
Contribution
It introduces a geometric convolutional layer and 3D representations for unseen view action recognition, along with a new dataset for evaluating such models.
Findings
The proposed methods improve recognition accuracy on unseen viewpoints.
The new dataset provides a challenging benchmark for future research.
Viewpoint-invariant features are effectively learned using the new approaches.
Abstract
Standard methods for video recognition use large CNNs designed to capture spatio-temporal data. However, training these models requires a large amount of labeled training data, containing a wide variety of actions, scenes, settings and camera viewpoints. In this paper, we show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in their training data (i.e., unseen view action recognition). To address this, we develop approaches based on 3D representations and introduce a new geometric convolutional layer that can learn viewpoint invariant representations. Further, we introduce a new, challenging dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
