Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning
Hao Ma, Shijie Wang, Zhiqiang Pu, Siyao Zhao, Xiaolin Ai

TL;DR
This paper introduces a hierarchical, vision-based reward shaping approach using visual-language models to improve policy alignment with human common sense in multi-agent reinforcement learning, especially in complex, long-horizon tasks.
Contribution
It proposes a novel hierarchical reward shaping method leveraging visual-language models and adaptive skill selection, enhancing policy alignment without relying on rule-based rewards.
Findings
Achieves higher win rates in Google Research Football environment.
Effectively aligns policies with human common sense.
Theoretically preserves the optimal policy.
Abstract
Guiding the policy of multi-agent reinforcement learning to align with human common sense is a difficult problem, largely due to the complexity of modeling common sense as a reward, especially in complex and long-horizon multi-agent tasks. Recent works have shown the effectiveness of reward shaping, such as potential-based rewards, to enhance policy alignment. The existing works, however, primarily rely on experts to design rule-based rewards, which are often labor-intensive and lack a high-level semantic understanding of common sense. To solve this problem, we propose a hierarchical vision-based reward shaping method. At the bottom layer, a visual-language model (VLM) serves as a generic potential function, guiding the policy to align with human common sense through its intrinsic semantic understanding. To help the policy adapts to uncertainty and changes in long-horizon tasks, the top…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsALIGN
