TL;DR
The paper introduces GRAB, a comprehensive dataset capturing full-body human interactions with objects, including detailed 3D shapes, poses, and contact information, enabling advanced modeling of human-object manipulation.
Contribution
It presents the first large-scale, detailed whole-body grasping dataset with 3D meshes and contact data, and demonstrates its utility with a generative grasp prediction model.
Findings
GRAB dataset includes 10 subjects and 51 objects with detailed 3D data.
The dataset enables modeling of full-body involvement in object manipulation.
A baseline model, GrabNet, successfully predicts 3D hand grasps for unseen objects.
Abstract
Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConditional Variational Auto Encoder
