Forward and inverse reinforcement learning sharing network weights and hyperparameters
Eiji Uchibe, Kenji Doya

TL;DR
This paper introduces ERIL, a model-free imitation learning method that combines forward and inverse reinforcement learning with shared network weights and hyperparameters, improving sample efficiency in complex environments.
Contribution
ERIL uniquely integrates forward and inverse RL under entropy regularization with shared hyperparameters, enhancing imitation learning efficiency and interpretability.
Findings
ERIL outperforms baseline methods in MuJoCo environments.
ERIL demonstrates higher sample efficiency in robotic reaching tasks.
Estimated reward functions reveal individual strategies in pole-balancing tasks.
Abstract
This paper proposes model-free imitation learning named Entropy-Regularized Imitation Learning (ERIL) that minimizes the reverse Kullback-Leibler (KL) divergence. ERIL combines forward and inverse reinforcement learning (RL) under the framework of an entropy-regularized Markov decision process. An inverse RL step computes the log-ratio between two distributions by evaluating two binary discriminators. The first discriminator distinguishes the state generated by the forward RL step from the expert's state. The second discriminator, which is structured by the theory of entropy regularization, distinguishes the state-action-next-state tuples generated by the learner from the expert ones. One notable feature is that the second discriminator shares hyperparameters with the forward RL, which can be used to control the discriminator's ability. A forward RL step minimizes the reverse KL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
