Representation Flow for Action Recognition

AJ Piergiovanni; Michael S. Ryoo

arXiv:1810.01455·cs.CV·August 5, 2019

Representation Flow for Action Recognition

AJ Piergiovanni, Michael S. Ryoo

PDF

5 Repos

TL;DR

This paper introduces a novel convolutional layer inspired by optical flow to learn motion representations within CNNs, improving action recognition speed and accuracy through end-to-end training and stacking multiple flow layers.

Contribution

It presents a differentiable representation flow layer for CNNs and the innovative concept of stacking layers to learn 'flow of flow' representations for enhanced action recognition.

Findings

01

Outperforms traditional optical flow-based models in speed and accuracy

02

End-to-end training of motion representation layers improves recognition performance

03

Stacked flow layers effectively capture complex motion patterns

Abstract

In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition. Its parameters for iterative flow optimization are learned in an end-to-end fashion together with the other CNN model parameters, maximizing the action recognition performance. Furthermore, we newly introduce the concept of learning `flow of flow' representations by stacking multiple representation flow layers. We conducted extensive experimental evaluations, confirming its advantages over previous recognition models using traditional optical flows in both computational speed and performance. Code/models available here: https://piergiaj.github.io/rep-flow-site/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings