Who Owns This Agent? Tracing AI Agents Back to Their Owners
Ruben Chocron, Doron Jonathan Ben Chayim, Eyal Lenga, Gilad Gressel, Alina Oprea, Yisroel Mirsky

TL;DR
This paper introduces a novel protocol for tracing AI agents back to their responsible accounts, addressing accountability gaps in autonomous agent deployment.
Contribution
It formalizes the problem of agent attribution and proposes a practical, canary-based solution that is robust against adversarial content filtering.
Findings
The attribution method is reliable across various scenarios.
Canary constructions resist content filtering without harming agent performance.
The protocol is scalable for deployment by vendors.
Abstract
AI agents are increasingly deployed to act autonomously in the world, yet there is still no reliable way to trace a harmful agent back to the account that deployed it. This creates the same accountability gap across both ends of the intent spectrum: benign operators may deploy misconfigured or overbroad agents that cause harm unintentionally, while malicious operators may deliberately weaponize agents for scams, harassment, or cyber attacks. In many cases, these agents are powered by vendor-hosted models, a dependency that holds even for sophisticated adversaries such as state actors conducting cyber operations. In either case, affected parties can observe the behavior but cannot notify the responsible operator, stop the session, or identify the account for investigation. We formalize this gap as the problem of agent attribution: linking an observed agent interaction to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
