Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos
Marcel Sheeny de Moraes, Sankha Mukherjee, Neil M Robertson

TL;DR
This paper demonstrates that deep pose estimation from monocular RGB videos can effectively recognize human interactions, achieving high accuracy comparable to depth sensor methods, thus enabling interaction recognition with standard cameras.
Contribution
It introduces a novel five-step method leveraging deep pose estimation for interaction recognition in monocular videos, showing RGB cameras can match depth sensor performance.
Findings
Achieved 87.56% average accuracy on two-person interaction dataset.
RGB-based method performs comparably to depth sensor approaches.
Deep models enable effective human interaction recognition from monocular videos.
Abstract
Human interaction recognition is a challenging problem in computer vision and has been researched over the years due to its important applications. With the development of deep models for the human pose estimation problem, this work aims to verify the effectiveness of using the human pose in order to recognize the human interaction in monocular videos. This paper developed a method based on 5 steps: detect each person in the scene, track them, retrieve the human pose, extract features based on the pose and finally recognize the interaction using a classifier. The Two-Person interaction dataset was used for the development of this methodology. Using a whole sequence evaluation approach it achieved 87.56% of average accuracy of all interaction. Yun, et at achieved 91.10% using the same dataset, however their methodology used the depth sensor to recognize the interaction. The methodology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
