An Empirical Study of Interaction Smells in Multi-Turn Human-LLM Collaborative Code Generation
Binquan Zhang, Li Zhang, Lin Shi, Song Wang, Yuwei Qian, Linhui Zhao, Fang Liu, An Fu, Yida Ye

TL;DR
This paper investigates Interaction Smells in multi-turn human-LLM code generation, categorizes them, analyzes their distribution across models, and proposes a framework to mitigate these issues, improving interaction quality.
Contribution
It introduces the first taxonomy of Interaction Smells, evaluates their prevalence across models, and proposes a novel multi-agent framework to reduce these interaction issues.
Findings
Interaction Smells are categorized into three main types with nine subcategories.
Distribution of Interaction Smells varies significantly among different LLMs.
The proposed InCE framework improves task success rate and reduces Interaction Smells.
Abstract
Large Language Models (LLMs) have revolutionized code generation, evolving from static tools into dynamic conversational interfaces that facilitate complex, multi-turn collaborative programming. While LLMs exhibit remarkable proficiency in generating standalone code snippets, they often struggle to maintain contextual consistency during extended interactions, creating significant obstacles in the collaboration process. Existing benchmarks primarily emphasize the functional correctness of the final output, overlooking latent quality issues within the interaction process itself, which we term Interaction Smells. In this paper, we conduct an empirical study on sampled real-word user-LLM interactions from WildChat and LMSYS-Chat-1M datasets to systematically investigate Interaction Smells in human-LLM code generation tasks from the perspectives of phenomena, distribution, and mitigation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
