Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models
Gerard Yeo, Svetlana Churina, Kokil Jaidka

TL;DR
This study investigates how large language models encode perceived trustworthiness, revealing that they implicitly internalize trust signals related to fairness, certainty, and accountability without explicit supervision.
Contribution
It demonstrates that instruction-tuned LLMs encode psychologically grounded trust signals internally, providing a foundation for developing more credible and transparent AI systems.
Findings
Trust cues are implicitly encoded during pretraining.
Linearly decodable trust signals are present in model activations.
Fine-tuning refines trust representations without restructuring them.
Abstract
Perceived trustworthiness underpins how users navigate online information, yet it remains unclear whether large language models (LLMs),increasingly embedded in search, recommendation, and conversational systems, represent this construct in psychologically coherent ways. We analyze how instruction-tuned LLMs (Llama 3.1 8B, Qwen 2.5 7B, Mistral 7B) encode perceived trustworthiness in web-like narratives using the PEACE-Reviews dataset annotated for cognitive appraisals, emotions, and behavioral intentions. Across models, systematic layer- and head-level activation differences distinguish high- from low-trust texts, revealing that trust cues are implicitly encoded during pretraining. Probing analyses show linearly de-codable trust signals and fine-tuning effects that refine rather than restructure these representations. Strongest associations emerge with appraisals of fairness, certainty,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
