Modular Recurrence in Contextual MDPs for Universal Morphology Control
Laurens Engwegen, Daan Brinks, Wendelin B\"ohmer

TL;DR
This paper introduces a modular recurrent architecture for deep reinforcement learning that improves generalization to unseen robot morphologies by inferring contextual information through interactions, demonstrated on diverse MuJoCo robots.
Contribution
The paper proposes a novel modular recurrent architecture that enhances zero-shot generalization to new robot morphologies in multi-robot control tasks.
Findings
Significant performance improvements on unseen robots.
Effective inference of contextual information through interactions.
Robust generalization across different environments.
Abstract
A universal controller for any robot morphology would greatly improve computational and data efficiency. By utilizing contextual information about the properties of individual robots and exploiting their modular structure in the architecture of deep reinforcement learning agents, steps have been made towards multi-robot control. Generalization to new, unseen robots, however, remains a challenge. In this paper we hypothesize that the relevant contextual information is partially observable, but that it can be inferred through interactions for better generalization to contexts that are not seen during training. To this extent, we implement a modular recurrent architecture and evaluate its generalization performance on a large set of MuJoCo robots. The results show a substantial improved performance on robots with unseen dynamics, kinematics, and topologies, in four different environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Reinforcement Learning in Robotics · Robot Manipulation and Learning
MethodsSparse Evolutionary Training
