Fitting Reinforcement Learning Model to Behavioral Data under Bandits
Hao Zhu, Jasper Hoffmann, Baohe Zhang, Joschka Boedecker

TL;DR
This paper introduces a new convex relaxation-based method for fitting reinforcement learning models to behavioral data in bandit environments, offering comparable accuracy to state-of-the-art methods with faster computation.
Contribution
It provides a generic optimization framework for RL model fitting, analyzes its convexity, and develops a novel, efficient solution method with an open-source implementation.
Findings
The proposed method achieves similar accuracy to existing methods.
It significantly reduces computational time.
The method performs well in both simulated and real-world environments.
Abstract
We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem formulation for the fitting problem of a wide range of RL models that appear frequently in scientific research applications. We then provide a detailed theoretical analysis of its convexity properties. Based on the theoretical results, we introduce a novel solution method for the fitting problem of RL models based on convex relaxation and optimization. Our method is then evaluated in several simulated and real-world bandit environments to compare with some benchmark methods that appear in the literature. Numerical results indicate that our method achieves comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
