Constraint Decay: The Fragility of LLM Agents in Backend Code Generation
Francesco Dente, Dario Satriani, Paolo Papotti

TL;DR
This paper systematically evaluates how large language model agents handle structural constraints in backend code generation, revealing a decline in performance as complexity increases and highlighting key challenges in meeting non-functional requirements.
Contribution
It introduces a comprehensive benchmark for structural constraint adherence in backend code generation and analyzes the impact of complexity and framework differences on agent performance.
Findings
Agent performance drops by 30 points on average with increased structural complexity.
Success rates are high in minimal frameworks like Flask but much lower in convention-heavy frameworks.
Data-layer defects are the main root cause of errors in generated code.
Abstract
Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural patterns, databases, and object-relational mappings. Existing benchmarks often overlook these non-functional requirements, rewarding functionally correct but structurally arbitrary solutions. We present a systematic study evaluating how well agents handle structural constraints in multi-file backend generation. By fixing a unified API contract across 80 greenfield generation tasks and 20 feature-implementation tasks spanning eight web frameworks, we isolate the effect of structural complexity using a dual evaluation with end-to-end behavioral tests and static verifiers. Our findings reveal a phenomenon of constraint decay: as structural requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
