Agent Identity Evals: Measuring Agentic Identity

Elija Perrier; Michael Timothy Bennett

arXiv:2507.17257·cs.AI·July 24, 2025

Agent Identity Evals: Measuring Agentic Identity

Elija Perrier, Michael Timothy Bennett

PDF

Open Access

TL;DR

This paper introduces a new empirical framework called agent identity evals (AIE) to measure and ensure the stability, reliability, and consistency of language model agents' identities over time, addressing key challenges inherited from large language models.

Contribution

The paper presents a novel, statistically-driven evaluation framework with metrics for assessing and maintaining agentic identity in language model agents, including methods applicable throughout their lifecycle.

Findings

01

AIE provides measurable metrics for agent identity stability.

02

AIE can be integrated with performance and robustness measures.

03

Worked examples demonstrate application of AIE methods.

Abstract

Central to agentic capability and trustworthiness of language model agents (LMAs) is the extent they maintain stable, reliable, identity over time. However, LMAs inherit pathologies from large language models (LLMs) (statelessness, stochasticity, sensitivity to prompts and linguistically-intermediation) which can undermine their identifiability, continuity, persistence and consistency. This attrition of identity can erode their reliability, trustworthiness and utility by interfering with their agentic capabilities such as reasoning, planning and action. To address these challenges, we introduce \textit{agent identity evals} (AIE), a rigorous, statistically-driven, empirical framework for measuring the degree to which an LMA system exhibit and maintain their agentic identity over time, including their capabilities, properties and ability to recover from state perturbations. AIE comprises…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Topic Modeling