Efficient Egocentric Action Recognition with Multimodal Data
Marco Calzavara, Ard Kastrati, Matteo Macchini, Dushan Vasilevski, Roger Wattenhofer

TL;DR
This paper investigates how adjusting sampling rates of RGB video and 3D hand pose data affects egocentric action recognition, demonstrating that multimodal sampling strategies can maintain accuracy while reducing computational load on XR devices.
Contribution
It provides a systematic analysis of sampling frequency trade-offs in multimodal EAR, highlighting strategies to optimize accuracy and efficiency for real-time XR applications.
Findings
Reducing RGB frame sampling rate with high-frequency 3D hand pose preserves accuracy.
Up to 3x reduction in CPU usage without performance loss.
Multimodal sampling strategies enable efficient real-time EAR on XR devices.
Abstract
The increasing availability of wearable XR devices opens new perspectives for Egocentric Action Recognition (EAR) systems, which can provide deeper human understanding and situation awareness. However, deploying real-time algorithms on these devices can be challenging due to the inherent trade-offs between portability, battery life, and computational resources. In this work, we systematically analyze the impact of sampling frequency across different input modalities - RGB video and 3D hand pose - on egocentric action recognition performance and CPU usage. By exploring a range of configurations, we provide a comprehensive characterization of the trade-offs between accuracy and computational efficiency. Our findings reveal that reducing the sampling rate of RGB frames, when complemented with higher-frequency 3D hand pose input, can preserve high accuracy while significantly lowering CPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications
