A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset
Alejandro L\'opez-Cifuentes, Marcos Escudero-Vi\~nolo, Jes\'us, Besc\'os

TL;DR
This paper introduces a method for action recognition in egocentric videos that compensates for ego-motion by estimating and partitioning sequences into stable chunks, enhancing temporal feature extraction.
Contribution
It proposes a novel ego-motion compensation and content-driven temporal sampling approach that improves CNN-based action recognition in egocentric datasets.
Findings
Enhanced temporal receptive field of CNNs for action recognition.
Improved accuracy on the EPIC-Kitchens dataset.
Effective handling of ego-motion in first-person videos.
Abstract
Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal \textit{chunks} showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each \textit{chunk}, which are late fused.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
