An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition
Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Tian-Shan Liu, Kin-Man, Lam

TL;DR
This paper introduces a novel end-to-end two-stream neural network for egocentric human action recognition that replaces optical flow with a representation flow algorithm, significantly reducing computational costs while maintaining or improving accuracy.
Contribution
The paper proposes a representation flow-based two-stream network that enables end-to-end training and reduces prediction time in egocentric action recognition models.
Findings
Achieves comparable or better accuracy than traditional models on GTEA61, EGTEA GAZE+, and HMDB datasets.
Reduces prediction runtime from over 100 seconds to under 0.2 seconds per inference.
Demonstrates the effectiveness of class activation maps and ConvLSTM in improving recognition performance.
Abstract
With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branch in the egocentric action recognition model, enabling end-to-end training while reducing computational cost and prediction time. Our model, designed for egocentric action recognition, uses class activation maps (CAMs) to improve accuracy and ConvLSTM for spatio temporal encoding with spatial attention. When evaluated on the GTEA61, EGTEA GAZE+, and HMDB datasets, our model matches the accuracy of the original model on GTEA61 and exceeds it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsTanh Activation · Convolution · Sigmoid Activation · ConvLSTM · Class-activation map · Focus
