Can Language Representation Models Think in Bets?
Zhisheng Tang, Mayank Kejriwal

TL;DR
This paper evaluates transformer-based language models' ability to make rational decisions by modeling decision-making as bets, revealing their dependence on fine-tuning and structural consistency for rationality.
Contribution
The study introduces a decision-making benchmark based on betting scenarios to assess LRMs' rationality and demonstrates the importance of fine-tuning and question structure.
Findings
LRMs perform better when fine-tuned on bet questions with identical structure.
Performance drops by over 25% when the question structure is altered.
LRMs are more rational when selecting outcomes with non-negative expected gain.
Abstract
In recent years, transformer-based language representation models (LRMs) have achieved state-of-the-art results on difficult natural language understanding problems, such as question answering and text summarization. As these models are integrated into real-world applications, evaluating their ability to make rational decisions is an important research agenda, with practical ramifications. This article investigates LRMs' rational decision-making ability through a carefully designed set of decision-making benchmarks and experiments. Inspired by classic work in cognitive science, we model the decision-making problem as a bet. We then investigate an LRM's ability to choose outcomes that have optimal, or at minimum, positive expected gain. Through a robust body of experiments on four established LRMs, we show that a model is only able to `think in bets' if it is first fine-tuned on bet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
