An End-to-End Two-Stream Network Based on RGB Flow and Representation   Flow for Human Action Recognition

Song-Jiang Lai; Tsun-Hin Cheung; Ka-Chun Fung; Tian-Shan Liu; Kin-Man; Lam

arXiv:2411.18002·cs.CV·November 28, 2024

An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition

Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Tian-Shan Liu, Kin-Man, Lam

PDF

Open Access

TL;DR

This paper introduces a novel end-to-end two-stream neural network for egocentric human action recognition that replaces optical flow with a representation flow algorithm, significantly reducing computational costs while maintaining or improving accuracy.

Contribution

The paper proposes a representation flow-based two-stream network that enables end-to-end training and reduces prediction time in egocentric action recognition models.

Findings

01

Achieves comparable or better accuracy than traditional models on GTEA61, EGTEA GAZE+, and HMDB datasets.

02

Reduces prediction runtime from over 100 seconds to under 0.2 seconds per inference.

03

Demonstrates the effectiveness of class activation maps and ConvLSTM in improving recognition performance.

Abstract

With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branch in the egocentric action recognition model, enabling end-to-end training while reducing computational cost and prediction time. Our model, designed for egocentric action recognition, uses class activation maps (CAMs) to improve accuracy and ConvLSTM for spatio temporal encoding with spatial attention. When evaluated on the GTEA61, EGTEA GAZE+, and HMDB datasets, our model matches the accuracy of the original model on GTEA61 and exceeds it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsTanh Activation · Convolution · Sigmoid Activation · ConvLSTM · Class-activation map · Focus