Motion-driven Visual Tempo Learning for Video-based Action Recognition

Yuanzhong Liu; Junsong Yuan; Zhigang Tu

arXiv:2202.12116·cs.CV·July 13, 2022

Motion-driven Visual Tempo Learning for Video-based Action Recognition

Yuanzhong Liu, Junsong Yuan, Zhigang Tu

PDF

2 Repos

TL;DR

This paper introduces a plug-in Temporal Correlation Module (TCM) that enhances video action recognition by capturing fine-grained visual tempo from low-level features, improving performance across multiple benchmarks.

Contribution

The work presents a novel TCM with MTDM and TAM components that effectively extract temporal dynamics at a single-layer, outperforming previous multi-rate sampling methods.

Findings

01

Significant accuracy improvements on benchmarks like Kinetics-400 and Something-Something V2.

02

Effective extraction of both fast and slow temporal dynamics.

03

Plug-in design allows easy integration into existing models.

Abstract

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance. Previous methods capture the visual tempo either by sampling raw videos with multiple rates, which require a costly multi-layer network to handle each rate, or by hierarchically sampling backbone features, which rely heavily on high-level features that miss fine-grained temporal dynamics. In this work, we propose a Temporal Correlation Module (TCM), which can be easily embedded into the current action recognition backbones in a plug-in-and-play manner, to extract action visual tempo from low-level backbone features at single-layer remarkably. Specifically, our TCM contains two main components: a Multi-scale Temporal Dynamics Module (MTDM) and a Temporal Attention Module (TAM). MTDM applies a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTemporal Adaptive Module · Low-level backbone