Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem
Heejin Jo

TL;DR
This study demonstrates that prompt complexity and competing instructions significantly reduce the effectiveness of structured reasoning methods like STAR in production prompts, highlighting the importance of prompt design and model upgrades.
Contribution
It reveals how prompt complexity dilutes structured reasoning effectiveness and shows that model upgrades can enhance reasoning in isolation.
Findings
Structured reasoning achieves 100% accuracy in isolation.
Prompt complexity reduces reasoning accuracy to 0-30%.
Model upgrades improve reasoning performance without prompt changes.
Abstract
In a previous study [Jo, 2026], STAR reasoning (Situation, Task, Action, Result) raised car wash problem accuracy from 0% to 85% on Claude Sonnet 4.5, and to 100% with additional prompt layers. This follow-up asks: does STAR maintain its effectiveness in a production system prompt? We tested STAR inside InterviewMate's 60+ line production prompt, which had evolved through iterative additions of style guidelines, format instructions, and profile features. Three conditions, 20 trials each, on Claude Sonnet 4.6: (A) production prompt with Anthropic profile, (B) production prompt with default profile, (C) original STAR-only prompt. C scored 100% (verified at n=100). A and B scored 0% and 30%. Prompt complexity dilutes structured reasoning. STAR achieves 100% in isolation but degrades to 0-30% when surrounded by competing instructions. The mechanism: directives like "Lead with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman-Automation Interaction and Safety · Decision-Making and Behavioral Economics · Design Education and Practice
