Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning
Aditya Narendra, Mukhammadrizo Maribjonov, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov

TL;DR
This paper presents KG-M3PO, a multi-task reinforcement learning framework that integrates a dynamic 3D scene graph with multi-modal perception to improve robotic manipulation in complex, partially observable environments.
Contribution
It introduces a novel knowledge-graph based approach that unifies perception, knowledge, and policy for scalable, generalizable manipulation tasks using reinforcement learning.
Findings
Achieves higher success rates on manipulation tasks with occlusions and distractors.
Demonstrates improved sample efficiency and generalization to new objects.
Shows that structured world knowledge enhances control performance.
Abstract
This paper introduces Knowledge Graph based Massively Multi-task Model-based Policy Optimization (KG-M3PO), a framework for multi-task robotic manipulation in partially observable settings that unifies Perception, Knowledge, and Policy. The method augments egocentric vision with an online 3D scene graph that grounds open-vocabulary detections into a metric, relational representation. A dynamic-relation mechanism updates spatial, containment, and affordance edges at every step, and a graph neural encoder is trained end-to-end through the RL objective so that relational features are shaped directly by control performance. Multiple observation modalities (visual, proprioceptive, linguistic, and graph-based) are encoded into a shared latent space, upon which the RL agent operates to drive the control loop. The policy conditions on lightweight graph queries alongside visual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications
