Generative Adversarial Reward Learning for Generalized Behavior Tendency   Inference

Xiaocong Chen; Lina Yao; Xianzhi Wang; Aixin Sun; Wenjie Zhang and; Quan Z. Sheng

arXiv:2105.00822·cs.LG·May 6, 2021

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Xiaocong Chen, Lina Yao, Xianzhi Wang, Aixin Sun, Wenjie Zhang and, Quan Z. Sheng

PDF

TL;DR

This paper introduces a generative inverse reinforcement learning approach that automatically learns reward functions from user behavior, enhancing adaptability and generalization in dynamic environments like recommender systems and traffic control.

Contribution

It presents a novel method combining discriminative actor-critic networks and Wasserstein GANs for automatic reward learning, improving over manually-defined rewards.

Findings

01

Outperforms state-of-the-art methods in multiple scenarios

02

Effectively models and explains behavioral tendencies

03

Demonstrates adaptability in dynamic environments

Abstract

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. Reward function is crucial for most of reinforcement learning applications as it can provide the guideline about the optimization. However, current reinforcement-learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic and noisy environments. Besides, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modelling, to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.