Systematic Characterization of the Effectiveness of Alignment in Large   Language Models for Categorical Decisions

Isaac Kohane

arXiv:2409.18995·cs.CL·October 1, 2024

Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions

Isaac Kohane

PDF

Open Access

TL;DR

This paper introduces a systematic methodology and a novel measure, the Alignment Compliance Index (ACI), to evaluate how effectively large language models align with human preferences in high-stakes categorical decisions like medical triage.

Contribution

It presents a new evaluation framework and the ACI measure for assessing LLM alignment effectiveness, applicable beyond in-context learning, with empirical analysis on medical triage models.

Findings

01

Significant variability in alignment effectiveness across models and methods.

02

Pre-alignment performance does not guarantee post-alignment improvement.

03

Small changes in preference functions can cause large shifts in model rankings.

Abstract

As large language models (LLMs) are deployed in high-stakes domains like healthcare, understanding how well their decision-making aligns with human preferences and values becomes crucial, especially when we recognize that there is no single gold standard for these preferences. This paper applies a systematic methodology for evaluating preference alignment in LLMs on categorical decision-making with medical triage as a domain-specific use case. It also measures how effectively an alignment procedure will change the alignment of a specific model. Key to this methodology is a novel simple measure, the Alignment Compliance Index (ACI), that quantifies how effectively a LLM can be aligned to a given preference function or gold standard. Since the ACI measures the effect rather than the process of alignment, it is applicable to alignment methods beyond the in-context learning used in this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Systems and Technology Applications · Topic Modeling · Natural Language Processing Techniques