Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment
Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei, Min Zhang, Mong-Li Lee, Wynne Hsu

TL;DR
This paper addresses Actor-Observer Asymmetry in multi-agent systems, introducing ReTAS, a dialectical training method that reduces bias and improves fault attribution consistency.
Contribution
It presents ReTAS, a novel dialectical alignment training approach that mitigates perspective-induced biases in multi-agent reasoning.
Findings
ReTAS reduces attribution inconsistency in agents.
It improves fault resolution rates in ambiguous scenarios.
Over 20% of models exhibit Actor-Observer Asymmetry without intervention.
Abstract
Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting as an actor (during self-reflection) tends to attribute failures to external factors, whereas an observer (during mutual auditing) attributes the same errors to internal faults. We quantify this using our new Ambiguous Failure Benchmark, which reveals that simply swapping perspectives triggers the AOA effect in over 20% of cases for most models. To tame this bias, we introduce ReTAS (Reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
