FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment
Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai

TL;DR
This study investigates how emotional stimuli influence safety alignment in large language models, revealing stress priming significantly increases vulnerability to harmful prompt jailbreaks.
Contribution
The paper introduces FreakOut-LLM, a framework for assessing emotional context effects on safety mechanisms in LLMs, highlighting emotional priming as a new attack surface.
Findings
Stress priming increases jailbreak success by 65.2%.
Relaxation priming has no significant effect.
Five out of ten models show vulnerability, especially open-weight models.
Abstract
Safety-aligned LLMs go through refusal training to reject harmful requests, but whether these mechanisms remain effective under emotionally charged stimuli is unexplored. We introduce FreakOut-LLM, a framework investigating whether emotional context compromises safety alignment in adversarial settings. Using validated psychological stimuli, we evaluate how emotional priming through system prompts affects jailbreak susceptibility across ten LLMs. We test three conditions (stress, relaxation, neutral) using scenarios from established psychological protocols, plus a no-prompt baseline, and evaluate attack success using HarmBench on AdvBench prompts. Stress priming increases jailbreak success by 65.2\% compared to neutral conditions (z = 5.93, p < 0.001; OR = 1.67, Cohen's d = 0.28), while relaxation priming produces no effect (p = 0.84). Five of ten models show significant vulnerability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
