TL;DR
This paper evaluates and improves hand segmentation in egocentric videos, introduces new datasets for in-the-wild scenarios, and demonstrates that accurate hand maps enhance activity recognition accuracy.
Contribution
It fine-tunes a state-of-the-art segmentation model for hand detection, creates new datasets for diverse environments, and shows improved activity recognition using refined hand segmentation.
Findings
RefineNet outperforms other segmentation methods on hand datasets.
New datasets enable in-the-wild hand segmentation evaluation.
Accurate hand maps improve activity recognition accuracy.
Abstract
A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
