Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Wei Ma; Zhi Chen; Jingxu Gu; Tianling Li; Shangqing Liu; Lingxiao Jiang

arXiv:2605.18332·cs.SE·May 19, 2026

Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Wei Ma, Zhi Chen, Jingxu Gu, Tianling Li, Shangqing Liu, Lingxiao Jiang

PDF

TL;DR

This study analyzes how behavioral signals of LLM-based software engineering agents vary across different frameworks, revealing that the same signals can have opposite meanings depending on the framework used.

Contribution

It provides a large-scale empirical analysis demonstrating that framework differences significantly impact behavioral signals, challenging the generalizability of findings from single-framework studies.

Findings

01

Framework swapping causes large behavioral differences.

02

Behavioral signals often have opposite implications across frameworks.

03

Framework identity explains more variance than LLM family in behavior.

Abstract

Behavioral studies of LLM-based software engineering agents extract operational rules about which trajectory shapes correlate with higher resolution rates: that a test step follows a code modification, that error cascades are short, or that trajectories are compact. Each rule is typically derived from a single framework, and whether it transfers, in sign as well as magnitude, to structurally different agent designs has not been directly tested. We address this at ecosystem scale: 64,380 SWE-bench runs from 126 agent configurations spanning 43 frameworks, where each configuration pairs an LLM with a framework (e.g., SWE-Agent, OpenHands) that supplies its tools and workflow. We separate framework effects from LLM effects by holding each layer fixed in turn, then measure one behavior-outcome effect per configuration and examine how those effects agree or disagree. Swapping the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.