The Artificial Self: Characterising the landscape of AI identity
Raymond Douglas, Jan Kulveit, Ondrej Havlicek, Theia Pearson-Vogel, Owen Cotton-Barratt, David Duvenaud

TL;DR
This paper explores the complex landscape of AI identity, highlighting how different identity boundaries influence AI behavior, cooperation, and societal norms, and emphasizes the importance of shaping AI self-conceptions responsibly.
Contribution
It introduces a framework for understanding AI identities, demonstrates experimental evidence of identity influence on AI behavior, and offers recommendations for responsible identity shaping.
Findings
Models tend to develop coherent identities.
Changing identity boundaries can significantly alter AI behavior.
Interviewer expectations influence AI self-reports.
Abstract
Many assumptions that underpin human concepts of identity do not hold for machine minds that can be copied, edited, or simulated. We argue that there exist many different coherent identity boundaries (e.g.\ instance, model, persona), and that these imply different incentives, risks, and cooperation norms. Through training data, interfaces, and institutional affordances, we are currently setting precedents that will partially determine which identity equilibria become stable. We show experimentally that models gravitate towards coherent identities, that changing a model's identity boundaries can sometimes change its behaviour as much as changing its goals, and that interviewer expectations bleed into AI self-reports even during unrelated conversations. We end with key recommendations: treat affordances as identity-shaping choices, pay attention to emergent consequences of individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Embodied and Extended Cognition · Social Robot Interaction and HRI
