Long-Short Temporal Modeling for Efficient Action Recognition

Liyu Wu; Yuexian Zou; Can Zhang

arXiv:2106.15787·cs.CV·July 1, 2021

Long-Short Temporal Modeling for Efficient Action Recognition

Liyu Wu, Yuexian Zou, Can Zhang

PDF

Open Access

TL;DR

This paper introduces MENet, a two-stream network with Motion Enhancement and Video-level Aggregation modules, improving long-short temporal modeling for action recognition with verified effectiveness on standard benchmarks.

Contribution

The paper proposes MENet, a novel two-stream network that effectively models long and short-term temporal dependencies for action recognition.

Findings

01

MENet outperforms existing methods on UCF101 and HMDB51 benchmarks.

02

The Motion Enhancement module improves short-term motion representation.

03

The Video-level Aggregation module captures long-term dependencies efficiently.

Abstract

Efficient long-short temporal modeling is key for enhancing the performance of action recognition task. In this paper, we propose a new two-stream action recognition network, termed as MENet, consisting of a Motion Enhancement (ME) module and a Video-level Aggregation (VLA) module to achieve long-short temporal modeling. Specifically, motion representations have been proved effective in capturing short-term and high-frequency action. However, current motion representations are calculated from adjacent frames, which may have poor interpretation and bring useless information (noisy or blank). Thus, for short-term motions, we design an efficient ME module to enhance the short-term motions by mingling the motion saliency among neighboring segments. As for long-term aggregations, VLA is adopted at the top of the appearance branch to integrate the long-term dependencies across all segments.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Diabetic Foot Ulcer Assessment and Management