Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT
Muhammad A. Muttaqien, Tomohiro Motoda, Ryo Hanai, Yukiyasu Domae

TL;DR
This paper presents a perception-action pipeline using annotation-guided visual prompting and Action Chunking with Transformers (ACT) to improve robotic pick-and-place tasks in cluttered retail environments, enhancing grasp accuracy and adaptability.
Contribution
It introduces a novel perception-action framework combining visual prompting with ACT for imitation learning in robotic manipulation, addressing challenges in dense object arrangements.
Findings
Improved grasp accuracy in cluttered environments
Enhanced adaptability of robotic pick-and-place operations
Successful demonstration of structured spatial guidance
Abstract
Robotic pick-and-place tasks in convenience stores pose challenges due to dense object arrangements, occlusions, and variations in object properties such as color, shape, size, and texture. These factors complicate trajectory planning and grasping. This paper introduces a perception-action pipeline leveraging annotation-guided visual prompting, where bounding box annotations identify both pickable objects and placement locations, providing structured spatial guidance. Instead of traditional step-by-step planning, we employ Action Chunking with Transformers (ACT) as an imitation learning algorithm, enabling the robotic arm to predict chunked action sequences from human demonstrations. This facilitates smooth, adaptive, and data-driven pick-and-place operations. We evaluate our system based on success rate and visual analysis of grasping behavior, demonstrating improved grasp accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Social Robot Interaction and HRI
