On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor

TL;DR
This paper explores the fundamental trade-off between information gained and regret suffered in sequential decision-making, using information-theoretic bounds to quantify and improve performance in tasks like question-answering with large language models.
Contribution
It introduces the first Bayesian regret lower bounds based on information accumulation and derives regret upper bounds, linking information measured in bits to regret in reward.
Findings
Information-theoretic bounds effectively quantify the regret-information trade-off.
Bayesian regret lower bounds depend on the amount of information accumulated.
Application to large language models improves question-answering performance.
Abstract
In many sequential decision problems, an agent performs a repeated task. He then suffers regret and obtains information that he may use in the following rounds. However, sometimes the agent may also obtain information and avoid suffering regret by querying external sources. We study the trade-off between the information an agent accumulates and the regret it suffers. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. We also prove regret upper bounds using the amount of information the agent accumulates. These bounds show that information measured in bits, can be traded off for regret, measured in reward. Finally, we demonstrate the utility of these bounds in improving the performance of a…
Peer Reviews
Decision·ICLR 2025 Poster
The problem formulated in this paper seems interesting, and it is interesting to see how information affects learning in general. The paper also companies its theoretical results with experiments.
The technique is not the strong part of this paper.
- The paper presents an interesting information-theoretic approach to quantifying the regret-information trade-off - The theoretical approach is rigorous, with clear definitions and proofs - The paper is well-organized - The paper holds high significance for fields involving sequential decision-making (online learning in particular)
- The assumption regarding information gathering being independent of task history could limit applicability in some environments
I think this work has many of the ingredients of a strong conceptual paper. The authors identify a conceptual phenomenon which spans many mathematical models, formalize that phenomenon, and develop a method which can analyze this phenomenon simultaneously in all of those models. Although the LLM experiment initially felt out of place to me, I actually think it provides a nice complement to the theoretical results (although the theoretical results certainly remain the primary contribution). Ove
I have serious concerns about the presentation. Although the conceptual idea behind the paper is intuitive, it took me a while to make sense of the technical content of the paper. I think there are two issues: 1. Confusing writing and non-standard terminology. 2. Lack of explanation of the technical statements. I have provided a non-exhaustive list of examples below. Although I am not an expert in information-theoretic methods, I am quite familiar with bandits, RL, and Bayesian regret, so more
Code & Models
Videos
Taxonomy
TopicsAuction Theory and Applications · Benford’s Law and Fraud Detection · Stock Market Forecasting Methods
