Solving Inverse Problem for Multi-armed Bandits via Convex Optimization

Hao Zhu; Joschka Boedecker

arXiv:2501.18945·cs.CE·June 27, 2025

Solving Inverse Problem for Multi-armed Bandits via Convex Optimization

Hao Zhu, Joschka Boedecker

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convex relaxation and a heuristic approach to solve the inverse multi-armed bandit problem efficiently, with proven robustness and reduced computation time, applicable in neuroscience and psychology research.

Contribution

It presents a convex relaxation of the IMAB problem and a two-step heuristic method, improving robustness and efficiency over traditional local optimization techniques.

Findings

01

Heuristic method outperforms direct local optimization.

02

Achieves performance comparable to Monte Carlo methods.

03

Provides a convex relaxation with global solution guarantees under certain conditions.

Abstract

We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nrgrp/cvx_imab
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and ELM