Dynamic gesture retrieval: searching videos by human pose sequence
Cheng Zhang

TL;DR
This paper introduces a novel method for retrieving videos based on dynamic human pose sequences, enabling more precise searches by matching sequences of poses rather than static poses.
Contribution
It presents a new approach that converts 3D human poses into bone direction descriptors and uses a temporal pyramid sliding window for matching, allowing effective retrieval of videos with dynamic gestures.
Findings
Effective retrieval of videos with specific human pose sequences.
Handles variations in gesture duration through temporal pyramid matching.
Outperforms static pose-based retrieval methods.
Abstract
The number of static human poses is limited, it is hard to retrieve the exact videos using one single pose as the clue. However, with a pose sequence or a dynamic gesture as the keyword, retrieving specific videos becomes more feasible. We propose a novel method for querying videos containing a designated sequence of human poses, whereas previous works only designate a single static pose. The proposed method takes continuous 3d human poses from keyword gesture video and video candidates, then converts each pose in individual frames into bone direction descriptors, which describe the direction of each natural connection in articulated pose. A temporal pyramid sliding window is then applied to find matches between designated gesture and video candidates, which ensures that same gestures with different duration can be matched.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation
