Stackelberg Meta-Learning for Strategic Guidance in Multi-Robot Trajectory Planning
Yuhan Zhao, Quanyan Zhu

TL;DR
This paper introduces a Stackelberg meta-learning framework enabling a leader robot to quickly adapt its trajectory guidance strategies for different followers in cooperative tasks, even with incomplete information.
Contribution
It formulates the trajectory guidance as a dynamic Stackelberg game and applies meta-learning to enable rapid adaptation to various followers, improving cooperation efficiency.
Findings
Better generalization to different followers compared to other methods
Faster adaptation to specific followers with limited data
Enhanced guidance effectiveness over zero guidance scenarios
Abstract
Trajectory guidance requires a leader robotic agent to assist a follower robotic agent to cooperatively reach the target destination. However, planning cooperation becomes difficult when the leader serves a family of different followers and has incomplete information about the followers. There is a need for learning and fast adaptation of different cooperation plans. We develop a Stackelberg meta-learning approach to address this challenge. We first formulate the guided trajectory planning problem as a dynamic Stackelberg game to capture the leader-follower interactions. Then, we leverage meta-learning to develop cooperative strategies for different followers. The leader learns a meta-best-response model from a prescribed set of followers. When a specific follower initiates a guidance query, the leader quickly adapts to the follower-specific model with a small amount of learning data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
