Human-In-The-Loop Software Development Agents: Challenges and Future Directions
Jirat Pasuksmit, Wannita Takerngsaksiri, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, Shiyan Wang, Fan Jiang, Jing Li, Evan Cook, Kun Chen, Ming Wu

TL;DR
This paper discusses the deployment of human-in-the-loop AI agents in software development, highlighting challenges like high computational costs and evaluation variability, and proposes future research directions to enhance evaluation methods.
Contribution
It introduces the application of human-in-the-loop LLM-driven agents in software development and identifies key challenges and future research directions.
Findings
Deployed agents to resolve Jira work items.
Evaluated code quality using functional testing and GPT similarity.
Identified high computational costs and evaluation variability.
Abstract
Multi-agent LLM-driven systems for software development are rapidly gaining traction, offering new opportunities to enhance productivity. At Atlassian, we deployed Human-in-the-Loop Software Development Agents to resolve Jira work items and evaluated the generated code quality using functional correctness testing and GPT-based similarity scoring. This paper highlights two major challenges: the high computational costs of unit testing and the variability in LLM-based evaluations. We also propose future research directions to improve evaluation frameworks for Human-In-The-Loop software development tools.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability
