Measuring Intent Comprehension in LLMs
Nadav Kunievsky, James A. Evans

TL;DR
This paper introduces a formal framework to evaluate whether large language models can reliably understand user intent by producing consistent outputs for semantically equivalent prompts, addressing a key challenge in high-stakes applications.
Contribution
The paper proposes a variance decomposition framework for assessing intent comprehension in LLMs and applies it across multiple models to measure robustness and understanding.
Findings
Larger models tend to better attribute output variance to user intent.
Model understanding of intent improves with size but gains are often modest.
The framework highlights the importance of semantic diagnostics over accuracy-only benchmarks.
Abstract
People judge interactions with large language models (LLMs) as successful when outputs match what they want, not what they type. Yet LLMs are trained to predict the next token solely from text input, not underlying intent. Because written language is an imperfect proxy for intent, and correlations between phrasing and desired outcomes can break down in training data, models that rely too heavily on surface cues may respond inconsistently to semantically equivalent prompts. This makes it essential to evaluate whether LLMs can reliably infer user intent-especially in high-stakes settings where robustness and generalization are critical. We introduce a formal framework for assessing intent comprehension in LLMs: whether a model demonstrates robust understanding of user intent by producing consistent outputs across semantically equivalent prompts while differentiating between prompts with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
