Vision-Based Generic Potential Function for Policy Alignment in   Multi-Agent Reinforcement Learning

Hao Ma; Shijie Wang; Zhiqiang Pu; Siyao Zhao; Xiaolin Ai

arXiv:2502.13430·cs.AI·February 20, 2025

Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning

Hao Ma, Shijie Wang, Zhiqiang Pu, Siyao Zhao, Xiaolin Ai

PDF

Open Access

TL;DR

This paper introduces a hierarchical, vision-based reward shaping approach using visual-language models to improve policy alignment with human common sense in multi-agent reinforcement learning, especially in complex, long-horizon tasks.

Contribution

It proposes a novel hierarchical reward shaping method leveraging visual-language models and adaptive skill selection, enhancing policy alignment without relying on rule-based rewards.

Findings

01

Achieves higher win rates in Google Research Football environment.

02

Effectively aligns policies with human common sense.

03

Theoretically preserves the optimal policy.

Abstract

Guiding the policy of multi-agent reinforcement learning to align with human common sense is a difficult problem, largely due to the complexity of modeling common sense as a reward, especially in complex and long-horizon multi-agent tasks. Recent works have shown the effectiveness of reward shaping, such as potential-based rewards, to enhance policy alignment. The existing works, however, primarily rely on experts to design rule-based rewards, which are often labor-intensive and lack a high-level semantic understanding of common sense. To solve this problem, we propose a hierarchical vision-based reward shaping method. At the bottom layer, a visual-language model (VLM) serves as a generic potential function, guiding the policy to align with human common sense through its intrinsic semantic understanding. To help the policy adapts to uncertainty and changes in long-horizon tasks, the top…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsALIGN