Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem

Heejin Jo

arXiv:2603.13351·cs.AI·March 17, 2026

Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem

Heejin Jo

PDF

Open Access

TL;DR

This study demonstrates that prompt complexity and competing instructions significantly reduce the effectiveness of structured reasoning methods like STAR in production prompts, highlighting the importance of prompt design and model upgrades.

Contribution

It reveals how prompt complexity dilutes structured reasoning effectiveness and shows that model upgrades can enhance reasoning in isolation.

Findings

01

Structured reasoning achieves 100% accuracy in isolation.

02

Prompt complexity reduces reasoning accuracy to 0-30%.

03

Model upgrades improve reasoning performance without prompt changes.

Abstract

In a previous study [Jo, 2026], STAR reasoning (Situation, Task, Action, Result) raised car wash problem accuracy from 0% to 85% on Claude Sonnet 4.5, and to 100% with additional prompt layers. This follow-up asks: does STAR maintain its effectiveness in a production system prompt? We tested STAR inside InterviewMate's 60+ line production prompt, which had evolved through iterative additions of style guidelines, format instructions, and profile features. Three conditions, 20 trials each, on Claude Sonnet 4.6: (A) production prompt with Anthropic profile, (B) production prompt with default profile, (C) original STAR-only prompt. C scored 100% (verified at n=100). A and B scored 0% and 30%. Prompt complexity dilutes structured reasoning. STAR achieves 100% in isolation but degrades to 0-30% when surrounded by competing instructions. The mechanism: directives like "Lead with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety · Decision-Making and Behavioral Economics · Design Education and Practice