Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

Dilermando Almeida; Juliano Negri; Guilherme Lazzarini; Thiago H. Segreto; Ranulfo Bezerra; Ricardo V. Godoy; Marcelo Becker

arXiv:2603.07866·cs.RO·May 6, 2026

Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

Dilermando Almeida, Juliano Negri, Guilherme Lazzarini, Thiago H. Segreto, Ranulfo Bezerra, Ricardo V. Godoy, Marcelo Becker

PDF

TL;DR

This paper introduces a robust, language-guided grasping pipeline for legged robots that effectively handles occlusions and partial observations, significantly improving success rates in cluttered environments.

Contribution

The novel pipeline combines open-vocabulary detection, point cloud completion, and safety heuristics for grasping, demonstrating enhanced robustness over baseline methods.

Findings

01

Achieved 90% success rate in cluttered scenarios

02

Outperformed baseline with 30% success rate

03

Effectively handled occlusions and partial observations

Abstract

Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible approaches. In this paper we present an end-to-end pipeline for language-guided grasping that bridges open-vocabulary target selection to safe grasp execution on a real robot. Given a natural-language command, the system grounds the target in RGB using open-vocabulary detection and promptable instance segmentation, extracts an object-centric point cloud from RGB-D, and improves geometric reliability under occlusion via back-projected depth compensation and two-stage point cloud completion. We then generate and collision-filter 6-DoF grasp candidates and select an executable grasp using safety-oriented heuristics that account for reachability, approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.