Adaptive action supervision in reinforcement learning from real-world   multi-agent demonstrations

Keisuke Fujii; Kazushi Tsutsui; Atom Scott; Hiroshi Nakahara; Naoya; Takeishi; Yoshinobu Kawahara

arXiv:2305.13030·cs.AI·December 20, 2023·1 cites

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Nakahara, Naoya, Takeishi, Yoshinobu Kawahara

PDF

Open Access

TL;DR

This paper introduces an adaptive action supervision method in reinforcement learning that leverages real-world multi-agent demonstrations, effectively bridging the domain gap between real-world data and simulated environments.

Contribution

It proposes a novel approach combining RL and supervised learning using dynamic time warping to select actions, enhancing reproducibility and generalization in multi-agent RL tasks.

Findings

01

Achieved a balance between imitation and reward-based learning in experiments.

02

Successfully applied the method to football tracking data with large domain gaps.

03

Demonstrated improved performance over baseline methods.

Abstract

Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace for RL), and the source environment parameters are usually unknown. In this paper, we propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios. We adopt an approach that combines RL and supervised learning by selecting actions of demonstrations in RL based on the minimum distance of dynamic time warping for utilizing the information of the unknown source dynamics. This approach can be easily applied to many existing neural network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSports Analytics and Performance · Time Series Analysis and Forecasting

MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · 1x1 Convolution · Feedforward Network · Two Time-scale Update Rule · Projection Discriminator · Non-Local Operation · Adam · Non-Local Block