Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform
Shimian Zhang, Qiuhong Lu

TL;DR
This paper presents a novel mobile robotic system integrating the Segment Anything visual foundation model with a robotic arm, enabling dynamic object segmentation, tracking, grasping, and intuitive user interaction in diverse environments.
Contribution
It introduces a new system combining SAM with a robotic arm on a mobile platform, enhancing adaptability and interaction in robotic applications.
Findings
Effective object tracking and grasping demonstrated in real-world tests
Enhanced user interaction through multimodal commands
System adaptable to various dynamic environments
Abstract
In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-effector ensures continuous object tracking, significantly mitigating environmental uncertainties. By deploying on a mobile platform, our grasping system has an enhanced mobility, playing a key role in dynamic environments where adaptability are critical. This synthesis enables dynamic object segmentation, tracking, and grasping. It also elevates user interaction, allowing the robot to intuitively respond to various modalities such as clicks, drawings, or voice commands, beyond traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Robotic Path Planning Algorithms · Simulation and Modeling Applications
