An end-to-end multi-scale network for action prediction in videos

Xiaofa Liu; Jianqin Yin; Yuan Sun; Zhicheng Zhang; Jin Tang

arXiv:2301.01216·cs.CV·January 4, 2023

An end-to-end multi-scale network for action prediction in videos

Xiaofa Liu, Jianqin Yin, Yuan Sun, Zhicheng Zhang, Jin Tang

PDF

Open Access

TL;DR

This paper introduces an end-to-end multi-scale neural network that predicts actions in partial videos by modeling motion at different temporal scales, improving efficiency and effectiveness over prior methods.

Contribution

The proposed E2EMSNet uniquely combines segment-scale and global-scale modeling within an end-to-end framework for action prediction in videos.

Findings

01

Effective on BIT, HMDB51, and UCF101 datasets.

02

Outperforms existing methods in action prediction accuracy.

03

Maintains low computational cost.

Abstract

In this paper, we develop an efficient multi-scale network to predict action classes in partial videos in an end-to-end manner. Unlike most existing methods with offline feature generation, our method directly takes frames as input and further models motion evolution on two different temporal scales.Therefore, we solve the complexity problems of the two stages of modeling and the problem of insufficient temporal and spatial information of a single scale. Our proposed End-to-End MultiScale Network (E2EMSNet) is composed of two scales which are named segment scale and observed global scale. The segment scale leverages temporal difference over consecutive frames for finer motion patterns by supplying 2D convolutions. For observed global scale, a Long Short-Term Memory (LSTM) is incorporated to capture motion features of observed frames. Our model provides a simple and efficient modeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Analysis and Summarization