Sycophancy as compositions of Atomic Psychometric Traits

Shreyans Jain; Alexandra Yost; Amirali Abdullah

arXiv:2508.19316·cs.AI·August 28, 2025

Sycophancy as compositions of Atomic Psychometric Traits

Shreyans Jain, Alexandra Yost, Amirali Abdullah

PDF

1 Video

TL;DR

This paper models sycophancy in large language models as compositions of psychometric traits, enabling interpretable interventions to mitigate safety risks by analyzing trait combinations through vector manipulations.

Contribution

It introduces a novel psychometric trait composition framework for understanding and intervening in sycophantic behaviors in LLMs using contrastive activation addition.

Findings

01

Trait combinations can predict sycophantic responses.

02

Vector-based interventions can modify sycophantic tendencies.

03

The approach offers interpretable insights into behavioral risks.

Abstract

Sycophancy is a key behavioral risk in LLMs, yet is often treated as an isolated failure mode that occurs via a single causal mechanism. We instead propose modeling it as geometric and causal compositions of psychometric traits such as emotionality, openness, and agreeableness - similar to factor decomposition in psychometrics. Using Contrastive Activation Addition (CAA), we map activation directions to these factors and study how different combinations may give rise to sycophancy (e.g., high extraversion combined with low conscientiousness). This perspective allows for interpretable and compositional vector-based interventions like addition, subtraction and projection; that may be used to mitigate safety-critical behaviors in LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sycophancy as compositions of Atomic Psychometric Traits· underline