Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions
Jiawei Chen, Jonathan Wu, Janusz Konrad, Prakash Ishwar

TL;DR
This paper introduces a semi-coupled filter-sharing network that enhances action recognition in extremely low-resolution videos by leveraging high-resolution training data and fusing spatial-temporal information.
Contribution
It proposes a novel semi-coupled network architecture and fusion methods specifically designed for eLR videos, improving recognition accuracy over existing approaches.
Findings
Achieved 93.7% accuracy on IXMAS dataset
Achieved 29.2% accuracy on HMDB dataset
Outperformed state-of-the-art methods at eLR
Abstract
Deep convolutional neural networks (ConvNets) have been recently shown to attain state-of-the-art performance for action recognition on standard-resolution videos. However, less attention has been paid to recognition performance at extremely low resolutions (eLR) (e.g., 16 x 12 pixels). Reliable action recognition using eLR cameras would address privacy concerns in various application environments such as private homes, hospitals, nursing/rehabilitation facilities, etc. In this paper, we propose a semi-coupled filter-sharing network that leverages high resolution (HR) videos during training in order to assist an eLR ConvNet. We also study methods for fusing spatial and temporal ConvNets customized for eLR videos in order to take advantage of appearance and motion information. Our method outperforms state-of-the-art methods at extremely low resolutions on IXMAS (93.7%) and HMDB (29.2%)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Anomaly Detection Techniques and Applications
