Pixel2Catch: Multi-Agent Sim-to-Real Transfer for Agile Manipulation with a Single RGB Camera

Seongyong Kim; Junhyeon Cho; Kang-Won Lee; and Soo-Chul Lim

arXiv:2602.22733·cs.RO·February 27, 2026

Pixel2Catch: Multi-Agent Sim-to-Real Transfer for Agile Manipulation with a Single RGB Camera

Seongyong Kim, Junhyeon Cho, Kang-Won Lee, and Soo-Chul Lim

PDF

Open Access

TL;DR

This paper introduces a novel multi-agent reinforcement learning approach for agile object catching using a single RGB camera, enabling sim-to-real transfer without explicit 3D object estimation.

Contribution

It proposes a multi-agent framework that separately trains robot arm and hand policies with role-specific observations, improving stability and transferability.

Findings

01

Successful sim-to-real transfer of policies

02

Effective recognition of object motion from RGB images

03

Enhanced stability in high-DoF manipulation tasks

Abstract

To catch a thrown object, a robot must be able to perceive the object's motion and generate control actions in a timely manner. Rather than explicitly estimating the object's 3D position, this work focuses on a novel approach that recognizes object motion using pixel-level visual information extracted from a single RGB image. Such visual cues capture changes in the object's position and scale, allowing the policy to reason about the object's motion. Furthermore, to achieve stable learning in a high-DoF system composed of a robot arm equipped with a multi-fingered hand, we design a heterogeneous multi-agent reinforcement learning framework that defines the arm and hand as independent agents with distinct roles. Each agent is trained cooperatively using role-specific observations and rewards, and the learned policies are successfully transferred from simulation to the real world.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Reinforcement Learning in Robotics