Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

Abdulaziz Almuzairee; Rohan Patil; Dwait Bhatt; Henrik I. Christensen

arXiv:2505.04619·cs.LG·September 1, 2025

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

Abdulaziz Almuzairee, Rohan Patil, Dwait Bhatt, Henrik I. Christensen

PDF

Open Access

TL;DR

This paper introduces the MAD algorithm that merges and disentangles multiple camera views in visual reinforcement learning for robotic manipulation, improving sample efficiency and robustness while enabling lightweight deployment.

Contribution

The MAD algorithm is a novel method that efficiently merges multi-view visual data and disentangles views to enhance policy robustness and sample efficiency in robotic manipulation tasks.

Findings

01

MAD improves sample efficiency in Meta-World and ManiSkill3 environments.

02

MAD produces more robust policies against camera failures.

03

MAD enables lightweight deployment of multi-view policies.

Abstract

Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Reinforcement Learning in Robotics

MethodsQ-Learning