Robot Manipulation in Salient Vision through Referring Image   Segmentation and Geometric Constraints

Chen Jiang; Allie Luo; Martin Jagersand

arXiv:2409.11518·cs.RO·September 19, 2024

Robot Manipulation in Salient Vision through Referring Image Segmentation and Geometric Constraints

Chen Jiang, Allie Luo, Martin Jagersand

PDF

Open Access

TL;DR

This paper introduces CLIPU$^2$Net, a lightweight vision-language model for precise robot manipulation using referring image segmentation and geometric constraints, enabling effective real-world robot control.

Contribution

The paper presents a novel compact referring image segmentation model and integrates it into a visual servoing system for improved robot manipulation based on language cues.

Findings

01

Outperforms traditional visual servoing methods in real-world tasks

02

Achieves fine-grain segmentation with a small model size of 6.6 MB

03

Supports diverse robot control scenarios

Abstract

In this paper, we perform robot manipulation activities in real-world environments with language contexts by integrating a compact referring image segmentation model into the robot's perception module. First, we propose CLIPU $^{2}$ Net, a lightweight referring image segmentation model designed for fine-grain boundary and structure segmentation from language expressions. Then, we deploy the model in an eye-in-hand visual servoing system to enact robot control in the real world. The key to our system is the representation of salient visual information as geometric constraints, linking the robot's visual perception to actionable commands. Experimental results on 46 real-world robot manipulation tasks demonstrate that our method outperforms traditional visual servoing methods relying on labor-intensive feature annotations, excels in fine-grain referring image segmentation with a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning