Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
Malte Mosbach, Sven Behnke

TL;DR
This paper introduces a memory-augmented student-teacher learning framework that combines promptable foundation models with reinforcement learning to enable robots to perform dexterous manipulation tasks based on high-level prompts, even with imperfect perception.
Contribution
It presents a novel integration of promptable perception models with reinforcement learning using memory augmentation for fine-grained control in robotics.
Findings
Effective prompt-responsive manipulation in cluttered scenes
Successful implicit state estimation from imperfect detections
Demonstrated dexterous object picking with high-level prompts
Abstract
Building models responsive to input prompts represents a transformative shift in machine learning. This paradigm holds significant potential for robotics problems, such as targeted manipulation amidst clutter. In this work, we present a novel approach to combine promptable foundation models with reinforcement learning (RL), enabling robots to perform dexterous manipulation tasks in a prompt-responsive manner. Existing methods struggle to link high-level commands with fine-grained dexterous control. We address this gap with a memory-augmented student-teacher learning framework. We use the Segment-Anything 2 (SAM 2) model as a perception backbone to infer an object of interest from user prompts. While detections are imperfect, their temporal sequence provides rich information for implicit state estimation by memory-augmented models. Our approach successfully learns prompt-responsive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Robotics and Automated Systems
