Human-In-The-Loop Software Development Agents: Challenges and Future Directions

Jirat Pasuksmit; Wannita Takerngsaksiri; Patanamon Thongtanunam; Chakkrit Tantithamthavorn; Ruixiong Zhang; Shiyan Wang; Fan Jiang; Jing Li; Evan Cook; Kun Chen; Ming Wu

arXiv:2506.11009·cs.SE·June 16, 2025

Human-In-The-Loop Software Development Agents: Challenges and Future Directions

Jirat Pasuksmit, Wannita Takerngsaksiri, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, Shiyan Wang, Fan Jiang, Jing Li, Evan Cook, Kun Chen, Ming Wu

PDF

Open Access

TL;DR

This paper discusses the deployment of human-in-the-loop AI agents in software development, highlighting challenges like high computational costs and evaluation variability, and proposes future research directions to enhance evaluation methods.

Contribution

It introduces the application of human-in-the-loop LLM-driven agents in software development and identifies key challenges and future research directions.

Findings

01

Deployed agents to resolve Jira work items.

02

Evaluated code quality using functional testing and GPT similarity.

03

Identified high computational costs and evaluation variability.

Abstract

Multi-agent LLM-driven systems for software development are rapidly gaining traction, offering new opportunities to enhance productivity. At Atlassian, we deployed Human-in-the-Loop Software Development Agents to resolve Jira work items and evaluated the generated code quality using functional correctness testing and GPT-based similarity scoring. This paper highlights two major challenges: the high computational costs of unit testing and the variability in LLM-based evaluations. We also propose future research directions to improve evaluation frameworks for Human-In-The-Loop software development tools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability