A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence
Jingying Ding, Yifan Feng, Ying Rong

TL;DR
This paper introduces QCARE, a new behavioral model for the exploration-exploitation trade-off in decision-making, combining theoretical analysis and experimental validation with human data.
Contribution
The paper presents QCARE, a novel adaptive model that generalizes Thompson Sampling to better reflect human exploration and exploitation behaviors.
Findings
QCARE captures key behavioral patterns in exploration-exploitation.
QCARE outperforms existing models in predictive accuracy.
Humans tend to over-explore in decision tasks.
Abstract
How do people navigate the exploration-exploitation (EE) trade-off when making repeated choices with unknown rewards? We study this question through the lens of multi-armed bandit problems and introduce a novel behavioral model, Quantal Choice with Adaptive Reduction of Exploration (QCARE). It generalizes Thompson Sampling, allowing for a principled way to quantify the EE trade-off and reflect human decision-making patterns. The model adaptively reduces exploration as information accumulates, with the reduction rate serving as a parameter to quantify the EE trade-off dynamics. We theoretically analyze how varying reduction rates influence decision quality, shedding light on the effects of ``over-exploration'' and ``under-exploration.'' Empirically, we validate QCARE through experiments collecting behavioral data from human participants. QCARE not only captures critical behavioral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Machine Learning and Algorithms
