Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions
Isaac Kohane

TL;DR
This paper introduces a systematic methodology and a novel measure, the Alignment Compliance Index (ACI), to evaluate how effectively large language models align with human preferences in high-stakes categorical decisions like medical triage.
Contribution
It presents a new evaluation framework and the ACI measure for assessing LLM alignment effectiveness, applicable beyond in-context learning, with empirical analysis on medical triage models.
Findings
Significant variability in alignment effectiveness across models and methods.
Pre-alignment performance does not guarantee post-alignment improvement.
Small changes in preference functions can cause large shifts in model rankings.
Abstract
As large language models (LLMs) are deployed in high-stakes domains like healthcare, understanding how well their decision-making aligns with human preferences and values becomes crucial, especially when we recognize that there is no single gold standard for these preferences. This paper applies a systematic methodology for evaluating preference alignment in LLMs on categorical decision-making with medical triage as a domain-specific use case. It also measures how effectively an alignment procedure will change the alignment of a specific model. Key to this methodology is a novel simple measure, the Alignment Compliance Index (ACI), that quantifies how effectively a LLM can be aligned to a given preference function or gold standard. Since the ACI measures the effect rather than the process of alignment, it is applicable to alignment methods beyond the in-context learning used in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Systems and Technology Applications · Topic Modeling · Natural Language Processing Techniques
