Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
Mhairi Dunion, Stefano V. Albrecht

TL;DR
This paper introduces Multi-View Disentanglement (MVD), a self-supervised method enabling reinforcement learning agents trained with multiple cameras to perform well using only a single camera during deployment.
Contribution
The paper proposes a novel MVD approach that learns disentangled shared and private representations from multiple cameras, improving policy robustness to camera reduction in RL.
Findings
MVD enables RL agents to generalize from multiple cameras to a single camera.
Agents trained with MVD perform well on control tasks using only one camera.
MVD improves robustness of RL policies to camera failures or damage.
Abstract
The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection
