Ethical Implications of Training Deceptive AI

Jason Starace; Bert Baumgaertner; Terence Soule

arXiv:2604.03250·cs.CY·April 7, 2026

Ethical Implications of Training Deceptive AI

Jason Starace, Bert Baumgaertner, Terence Soule

PDF

TL;DR

This paper introduces the Deception Research Levels (DRL) framework to classify and govern deceptive AI research based on risk, aiming to fill governance gaps and promote safe, ethical development of deceptive AI capabilities.

Contribution

The paper proposes a novel DRL framework that classifies deceptive AI research by risk profile, grounded in ethical principles, and provides guidelines for safe research practices.

Findings

01

Applied the framework to eight case studies demonstrating its utility.

02

Identified ecological validity as a key indicator of risk level.

03

Established safeguards ranging from documentation to third-party audits.

Abstract

Deceptive behavior in AI systems is no longer theoretical: large language models strategically mislead without producing false statements, maintain deceptive strategies through safety training, and coordinate deception in multi-agent settings. While the European Union's AI Act prohibits deployment of deceptive AI systems, it explicitly exempts research and development, creating a necessary but unstructured space in which no established framework governs how deception research should be conducted or how risk should scale with capability. This paper proposes a Deception Research Levels (DRL) framework, a classification system for deceptive algorithm research modeled on the Biosafety Level system used in biological research. The DRL framework classifies research by risk profile rather than researcher intent, assessing deceptive mechanisms across five dimensions grounded in the AI4People…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.