Multi-Granularity Hand Action Detection
Ting Zhe, Jing Zhang, Yongqian Li, Yong Luo, Han Hu, Dacheng Tao

TL;DR
This paper introduces the FHA-Kitchens dataset for fine-grained hand action detection in videos and proposes MG-HAD, a novel end-to-end method that effectively handles multi-granularity hand actions, advancing research and applications in this area.
Contribution
The paper presents a new dataset with detailed annotations and a novel detection method for multi-granularity hand actions, addressing limitations of existing approaches.
Findings
MG-HAD outperforms existing methods in multi-granularity detection.
FHA-Kitchens dataset enables comprehensive evaluation of hand action detection.
Multi-granularity approach improves localization and classification accuracy.
Abstract
Detecting hand actions in videos is crucial for understanding video content and has diverse real-world applications. Existing approaches often focus on whole-body actions or coarse-grained action categories, lacking fine-grained hand-action localization information. To fill this gap, we introduce the FHA-Kitchens (Fine-Grained Hand Actions in Kitchen Scenes) dataset, providing both coarse- and fine-grained hand action categories along with localization annotations. This dataset comprises 2,377 video clips and 30,047 frames, annotated with approximately 200k bounding boxes and 880 action categories. Evaluation of existing action detection methods on FHA-Kitchens reveals varying generalization capabilities across different granularities. To handle multi-granularity in hand actions, we propose MG-HAD, an End-to-End Multi-Granularity Hand Action Detection method. It incorporates two new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
MethodsFocus
