Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects
Raja Soundaramourty, Ozkan Kilic, Ramu Chenchaiah

TL;DR
This paper critically examines the limitations of T-shirt sizing in AI projects, revealing five core assumptions that often lead to misestimations, and proposes Checkpoint Sizing as a more reliable alternative.
Contribution
It identifies five fundamental assumptions of T-shirt sizing that fail in AI contexts and introduces Checkpoint Sizing as a practical, iterative estimation method for AI development.
Findings
T-shirt sizing assumptions often fail in AI projects
AI development exhibits non-linear performance and complex interactions
Checkpoint Sizing improves scope and feasibility assessment
Abstract
Agile estimation techniques, particularly T-shirt sizing, are widely used in software development for their simplicity and utility in scoping work. However, when we apply these methods to artificial intelligence initiatives -- especially those involving large language models (LLMs) and multi-agent systems -- the results can be systematically misleading. This paper shares an evidence-backed analysis of five foundational assumptions we often make during T-shirt sizing. While these assumptions usually hold true for traditional software, they tend to fail in AI contexts: (1) linear effort scaling, (2) repeatability from prior experience, (3) effort-duration fungibility, (4) task decomposability, and (5) deterministic completion criteria. Drawing on recent research into multi-agent system failures, scaling principles, and the inherent unreliability of multi-turn conversations, we show how AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Software System Performance and Reliability
