Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations
Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Nakahara, Naoya, Takeishi, Yoshinobu Kawahara

TL;DR
This paper introduces an adaptive action supervision method in reinforcement learning that leverages real-world multi-agent demonstrations, effectively bridging the domain gap between real-world data and simulated environments.
Contribution
It proposes a novel approach combining RL and supervised learning using dynamic time warping to select actions, enhancing reproducibility and generalization in multi-agent RL tasks.
Findings
Achieved a balance between imitation and reward-based learning in experiments.
Successfully applied the method to football tracking data with large domain gaps.
Demonstrated improved performance over baseline methods.
Abstract
Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace for RL), and the source environment parameters are usually unknown. In this paper, we propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios. We adopt an approach that combines RL and supervised learning by selecting actions of demonstrations in RL based on the minimum distance of dynamic time warping for utilizing the information of the unknown source dynamics. This approach can be easily applied to many existing neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Time Series Analysis and Forecasting
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · 1x1 Convolution · Feedforward Network · Two Time-scale Update Rule · Projection Discriminator · Non-Local Operation · Adam · Non-Local Block
