Transformer-based Action recognition in hand-object interacting scenarios
Hoseong Cho, Seungryul Baek

TL;DR
This paper presents a Transformer-based framework for recognizing hand-object interaction actions in egocentric videos, achieving high accuracy in a competitive challenge.
Contribution
It introduces a novel Transformer-based keypoint estimator for hand and object detection in egocentric scenarios, improving action recognition performance.
Findings
Achieved 87.19% top-1 accuracy on the test set.
Outperformed other methods in the ECCV 2022 challenge.
Demonstrated effectiveness of Transformer architecture for keypoint estimation.
Abstract
This report describes the 2nd place solution to the ECCV 2022 Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras Challenge: Action Recognition. This challenge aims to recognize hand-object interaction in an egocentric view. We propose a framework that estimates keypoints of two hands and an object with a Transformer-based keypoint estimator and recognizes actions based on the estimated keypoints. We achieved a top-1 accuracy of 87.19% on the testset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
