Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection

Huiyi Wang; Fahim Shahriar; Alireza Azimi; Gautham Vasan; Rupam Mahmood; Colin Bellinger

arXiv:2507.10814·cs.RO·July 16, 2025

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection

Huiyi Wang, Fahim Shahriar, Alireza Azimi, Gautham Vasan, Rupam Mahmood, Colin Bellinger

PDF

Open Access

TL;DR

This paper introduces a goal-conditioned reinforcement learning framework that leverages pre-trained object detection models to enable versatile and generalizable robotic reach and grasp capabilities, demonstrating high success rates and faster learning.

Contribution

It integrates pre-trained object detection into goal-conditioned RL, enabling object-agnostic manipulation with improved generalization and efficiency.

Findings

01

Achieved ~90% success rate in grasping diverse objects

02

Faster convergence to higher returns in simulated tasks

03

Effective in both in-distribution and out-of-distribution scenarios

Abstract

General-purpose robotic manipulation, including reach and grasp, is essential for deployment into households and workspaces involving diverse and evolving tasks. Recent advances propose using large pre-trained models, such as Large Language Models and object detectors, to boost robotic perception in reinforcement learning. These models, trained on large datasets via self-supervised learning, can process text prompts and identify diverse objects in scenes, an invaluable skill in RL where learning object interaction is resource-intensive. This study demonstrates how to integrate such models into Goal-Conditioned Reinforcement Learning to enable general and versatile robotic reach and grasp capabilities. We use a pre-trained object detection model to enable the agent to identify the object from a text prompt and generate a mask for goal conditioning. Mask-based goal conditioning provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning