GenHeld: Generating and Editing Handheld Objects
Chaerin Min, Srinath Sridhar

TL;DR
GenHeld introduces a novel method for synthesizing and editing objects held by hands in both 3D models and 2D images, advancing the realism and plausibility of grasp representations in robotics and vision.
Contribution
The paper presents GenHeld, a new approach for generating and editing held objects conditioned on hand models or images, combining object code selection and diffusion-based image editing.
Findings
Outperforms baselines in plausibility of held object synthesis
Achieves high-quality results in both 3D and 2D scenarios
Demonstrates effective object placement and orientation without altering hand pose
Abstract
Grasping is an important human activity that has long been studied in robotics, computer vision, and cognitive science. Most existing works study grasping from the perspective of synthesizing hand poses conditioned on 3D or 2D object representations. We propose GenHeld to address the inverse problem of synthesizing held objects conditioned on 3D hand model or 2D image. Given a 3D model of hand, GenHeld 3D can select a plausible held object from a large dataset using compact object representations called object codes.The selected object is then positioned and oriented to form a plausible grasp without changing hand pose. If only a 2D hand image is available, GenHeld 2D can edit this image to add or replace a held object. GenHeld 2D operates by combining the abilities of GenHeld 3D with diffusion-based image editing. Results and experiments show that we outperform baselines and can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Digital Rights Management and Security · Semantic Web and Ontologies
