View From Above: A Framework for Evaluating Distribution Shifts in Model   Behavior

Tanush Chopra; Michael Li; Jacob Haimes

arXiv:2407.00948·cs.CL·October 1, 2024

View From Above: A Framework for Evaluating Distribution Shifts in Model Behavior

Tanush Chopra, Michael Li, Jacob Haimes

PDF

Open Access 1 Repo

TL;DR

This paper introduces a domain-agnostic framework to evaluate distribution shifts in large language models' decision-making, revealing potential behavioral misalignments through systematic testing in a blackjack environment.

Contribution

It presents a novel, systematic method for detecting distribution shifts in LLMs' behavior across different environments and tasks.

Findings

01

Significant distribution shifts detected in LLMs' decision-making.

02

Behavioral misalignments observed in over 1,000 blackjack trials.

03

Framework applicable across various domains for evaluating model robustness.

Abstract

When large language models (LLMs) are asked to perform certain tasks, how can we be sure that their learned representations align with reality? We propose a domain-agnostic framework for systematically evaluating distribution shifts in LLMs decision-making processes, where they are given control of mechanisms governed by pre-defined rules. While individual LLM actions may appear consistent with expected behavior, across a large number of trials, statistically significant distribution shifts can emerge. To test this, we construct a well-defined environment with known outcome logic: blackjack. In more than 1,000 trials, we uncover statistically significant evidence suggesting behavioral misalignment in the learned representations of LLM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Bluefin-Tuna/ApartResearch
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics in Business and Education · Information and Cyber Security · Securities Regulation and Market Practices

MethodsALIGN