Quantifying Laziness, Decoding Suboptimality, and Context Degradation in Large Language Models
Yiqing Ma, Jung-Hua Liu

TL;DR
This paper investigates behavioral issues in large language models, revealing prevalent laziness and instruction non-compliance, while surprisingly showing robustness against context degradation in long conversations, with implications for improving model reliability.
Contribution
The study provides a systematic quantification of laziness, decoding suboptimality, and context degradation in advanced LLMs through controlled experiments, highlighting areas for improvement.
Findings
Widespread laziness in multi-part instruction compliance
Limited evidence of decoding suboptimality in reasoning tasks
Models maintain core facts well over 200-turn conversations
Abstract
Large Language Models (LLMs) often exhibit behavioral artifacts such as laziness (premature truncation of responses or partial compliance with multi-part requests), decoding suboptimality (failure to select higher-quality sequences due to myopic decoding), and context degradation (forgetting or ignoring core instructions over long conversations). We conducted three controlled experiments (A, B, and C) to quantify these phenomena across several advanced LLMs (OpenAI GPT-4 variant, DeepSeek). Our results indicate widespread laziness in satisfying complex multi-part instructions: models frequently omitted required sections or failed to meet length requirements despite explicit prompting. However, we found limited evidence of decoding suboptimality in a simple reasoning task (the models' greedy answers appeared to align with their highest-confidence solution), and we observed surprising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
