Batch Reinforcement Learning from Crowds
Guoxi Zhang, Hisashi Kashima

TL;DR
This paper introduces a method for batch reinforcement learning that learns reward functions from human preferences, enabling scalable data collection without requiring expert demonstrations or explicit rewards.
Contribution
It proposes a probabilistic model to handle noisy preferences from non-experts and integrates it with reward learning, advancing scalable RL data collection methods.
Findings
Effective reward learning from crowdsourced preferences.
Robustness to noisy preference labels demonstrated on Atari datasets.
Ablation study highlights importance of collaborative label modeling.
Abstract
A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences. Generating preferences only requires a basic understanding of a task. Being a mental process, generating preferences is faster than performing demonstrations. So preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics · Data Stream Mining Techniques
