Recognizing Actions in Videos from Unseen Viewpoints

AJ Piergiovanni; Michael S. Ryoo

arXiv:2103.16516·cs.CV·March 31, 2021

Recognizing Actions in Videos from Unseen Viewpoints

AJ Piergiovanni, Michael S. Ryoo

PDF

TL;DR

This paper addresses the challenge of recognizing actions in videos from unseen camera viewpoints by developing 3D-based methods and a new geometric convolutional layer to learn viewpoint-invariant features, supported by a novel dataset.

Contribution

It introduces a geometric convolutional layer and 3D representations for unseen view action recognition, along with a new dataset for evaluating such models.

Findings

01

The proposed methods improve recognition accuracy on unseen viewpoints.

02

The new dataset provides a challenging benchmark for future research.

03

Viewpoint-invariant features are effectively learned using the new approaches.

Abstract

Standard methods for video recognition use large CNNs designed to capture spatio-temporal data. However, training these models requires a large amount of labeled training data, containing a wide variety of actions, scenes, settings and camera viewpoints. In this paper, we show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in their training data (i.e., unseen view action recognition). To address this, we develop approaches based on 3D representations and introduce a new geometric convolutional layer that can learn viewpoint invariant representations. Further, we introduce a new, challenging dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.