Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild
Binquan Zhang, Li Zhang, Haoyuan Zhang, Fang Liu, Song Wang, Bo Shen, An Fu, Lin Shi

TL;DR
This paper empirically analyzes how humans collaborate with large language models in coding tasks, revealing interaction patterns, challenges in instruction following, and factors influencing user satisfaction to improve AI-assisted development.
Contribution
It systematically explores human-LLM collaboration mechanisms in coding, analyzing datasets to identify interaction patterns, challenges, and satisfaction factors, offering insights for interface improvements.
Findings
Task types influence interaction patterns (linear, star, tree).
Bug fixing and refactoring challenge LLM instruction following.
User satisfaction varies with task type and query complexity.
Abstract
Large language models (LLMs) are increasingly acting as dynamic conversational interfaces, supporting multi-turn interactions that mimic human-like conversation and facilitate complex tasks like coding. While datasets such as LMSYS-Chat-1M and WildChat capture real-world user-LLM conversations, few studies systematically explore the mechanisms of human-LLM collaboration in coding scenarios. What tortuous paths do users experience during the interaction process? How well do the LLMs follow instructions? Are users satisfied? In this paper, we conduct an empirical analysis on human-LLM coding collaboration using LMSYS-Chat-1M and WildChat datasets to explore the human-LLM collaboration mechanism, LLMs' instruction following ability, and human satisfaction. This study yields interesting findings: 1) Task types shape interaction patterns(linear, star and tree), with code quality optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Speech and dialogue systems
