Hand-Object Interaction and Precise Localization in Transitive Action Recognition
Amir Rosenfeld, Shimon Ullman

TL;DR
This paper enhances action recognition in still images by precisely localizing objects and actor-object interactions, especially for face-related actions, using a coarse-to-fine segmentation approach, achieving significant accuracy improvements.
Contribution
It introduces a novel coarse-to-fine localization method combining semantic segmentation and context to improve recognition of transitive actions.
Findings
35% average relative improvement over state-of-the-art methods
Effective localization of action objects improves recognition accuracy
Validated approach on face-related action categories
Abstract
Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations produced by deep neural networks. However, there are still many cases in which performance remains far from that of humans. A major difficulty arises in distinguishing between transitive actions in which the overall actor pose is similar, and recognition therefore depends on details of the grasp and the object, which may be largely occluded. In this paper we demonstrate how recognition is improved by obtaining precise localization of the action-object and consequently extracting details of the object shape together with the actor-object interaction. To obtain exact localization of the action object and its interaction with the actor, we employ a coarse-to-fine approach which combines semantic segmentation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
