Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach
Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi

TL;DR
This paper introduces a novel free energy-based social bandit learning algorithm that leverages social observations to improve individual learning, especially with non-expert agents, and guarantees convergence with superior empirical performance.
Contribution
It presents a new social bandit learning method that evaluates agent expertise without external norms, integrating social and personal experiences for improved learning outcomes.
Findings
The algorithm converges to the optimal policy.
It outperforms alternative methods in various scenarios.
It maintains logarithmic regret while exploiting relevant agents.
Abstract
Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing
