Multilinguality in LLM-Designed Reward Functions for Restless Bandits:   Effects on Task Performance and Fairness

Ambreesh Parthasarathy; Chandrasekar Subramanian; Ganesh Senrayan,; Shreyash Adappanavar; Aparna Taneja; Balaraman Ravindran; Milind Tambe

arXiv:2501.13120·cs.CL·January 24, 2025

Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness

Ambreesh Parthasarathy, Chandrasekar Subramanian, Ganesh Senrayan,, Shreyash Adappanavar, Aparna Taneja, Balaraman Ravindran, Milind Tambe

PDF

Open Access

TL;DR

This paper investigates how multilingual prompts influence the effectiveness and fairness of LLM-designed reward functions in Restless Multi-Armed Bandits, revealing language and complexity impacts on performance and bias.

Contribution

It is the first study to analyze multilingual effects on LLM-based reward functions in RMABs, highlighting language resource disparities and prompt complexity impacts.

Findings

01

English prompts yield better task performance than other languages.

02

Prompt phrasing significantly affects reward function quality.

03

Increased prompt complexity reduces performance and fairness, especially in low-resource languages.

Abstract

Restless Multi-Armed Bandits (RMABs) have been successfully applied to resource allocation problems in a variety of settings, including public health. With the rapid development of powerful large language models (LLMs), they are increasingly used to design reward functions to better match human preferences. Recent work has shown that LLMs can be used to tailor automated allocation decisions to community needs using language prompts. However, this has been studied primarily for English prompts and with a focus on task performance only. This can be an issue since grassroots workers, especially in developing countries like India, prefer to work in local languages, some of which are low-resource. Further, given the nature of the problem, biases along population groups unintended by the user are also undesirable. In this work, we study the effects on both task performance and fairness when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research

MethodsFocus