From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering

Tao Dong; Harini Sampath; Ja Young Lee; Sherry Y. Shi; Andrew Macvean

arXiv:2512.23844·cs.SE·January 1, 2026

From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering

Tao Dong, Harini Sampath, Ja Young Lee, Sherry Y. Shi, Andrew Macvean

PDF

Open Access

TL;DR

This paper proposes a human-centered framework for evaluating AI agents in software engineering, emphasizing collaborative behaviors over mere correctness, through a taxonomy and a context-adaptive behavior model.

Contribution

It introduces a taxonomy of desirable agent behaviors and the Context-Adaptive Behavior Framework, shifting evaluation focus to dynamic, collaborative AI-human interactions.

Findings

01

Identified four key agent behavior expectations.

02

Developed the CAB Framework based on empirical axes.

03

Provided insights for designing collaborative AI agents.

Abstract

As Large Language Models (LLMs) evolve from code generators into collaborative partners for software engineers, our methods for evaluation are lagging. Current benchmarks, focused on code correctness, fail to capture the nuanced, interactive behaviors essential for successful human-AI partnership. To bridge this evaluation gap, this paper makes two core contributions. First, we present a foundational taxonomy of desirable agent behaviors for enterprise software engineering, derived from an analysis of 91 sets of user-defined agent rules. This taxonomy defines four key expectations of agent behavior: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solving Problems Effectively, and Collaborating with the User. Second, recognizing that these expectations are not static, we introduce the Context-Adaptive Behavior (CAB) Framework. This emerging framework reveals how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Multi-Agent Systems and Negotiation