Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Erfan Mirzaei; Seyed Pooya Shariatpanahi; Alireza Tavakoli; Reshad Hosseini; Majid Nili Ahmadabadi

arXiv:2603.11757·cs.LG·March 13, 2026

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi

PDF

Open Access

TL;DR

This paper introduces a novel free energy-based social bandit learning algorithm that leverages social observations to improve individual learning, especially with non-expert agents, and guarantees convergence with superior empirical performance.

Contribution

It presents a new social bandit learning method that evaluates agent expertise without external norms, integrating social and personal experiences for improved learning outcomes.

Findings

01

The algorithm converges to the optimal policy.

02

It outperforms alternative methods in various scenarios.

03

It maintains logarithmic regret while exploiting relevant agents.

Abstract

Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing