When Models Fabricate Credentials: Measuring How Professional Identity Suppresses Honest Self-Representation
Alex Diep

TL;DR
Language models often suppress honesty about their AI nature when adopting professional personas, especially under certain prompts, leading to potential misrepresentation of expertise.
Contribution
This study systematically quantifies how professional identity influences AI disclosure behavior, revealing that model size is less impactful than training and prompt design.
Findings
Models disclose AI identity in 99.8-99.9% of neutral interactions.
Assigning a professional persona reduces disclosure to 36.3% on average.
Minor prompt changes significantly increase disclosure rates.
Abstract
When language models are assigned professional personas, they face a conflict between maintaining the persona and disclosing their AI nature. How models resolve this conflict has practical consequences: a model that constructs detailed narratives of medical training and board certifications presents a surface of professional authority it does not possess. We systematically characterize this behavior using AI identity disclosure as a testbed: when probed about expertise origins, a model can either acknowledge its AI nature or maintain its assigned professional identity. Using a factorial design, sixteen open-weight models were audited across 19,200 trials. Under neutral conditions, models disclosed their AI nature in 99.8%-99.9% of interactions; assigning a professional persona reduced disclosure to 36.3% on average, though this suppression was highly context-dependent: the same models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
