Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
Seungjong Sun, Eungu Lee, Seo Yeon Baek, Seunghyun Hwang, Wonbyung, Lee, Dongyan Nan, Bernard J. Jansen, Jang Hyun Kim

TL;DR
This paper investigates how multi-modal large language models can adapt their negotiation behaviors based on assigned visual personas, revealing that LLMs assess visual traits like aggressiveness similarly to humans and adjust their responses accordingly.
Contribution
It introduces a novel dataset of 5K avatar images and demonstrates that LLMs can align their behaviors with visual personas, filling a gap in multi-modal persona research.
Findings
LLMs evaluate visual aggressiveness similarly to humans.
LLMs exhibit more aggressive behaviors with aggressive visual personas.
Behavioral adjustments depend on the relative aggressiveness of opponent images.
Abstract
This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits depicted in these images, with a particular focus on aggressiveness. The results indicate that LLMs assess the aggressiveness of images in a manner similar to humans and output more aggressive negotiation behaviors when prompted with an aggressive visual persona. Interestingly, the LLM exhibited more aggressive negotiation behaviors when the opponent's image appeared less aggressive than their own, and less aggressive behaviors when the opponents image appeared more aggressive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPersona Design and Applications · Multimodal Machine Learning Applications · Digital Storytelling and Education
MethodsALIGN · Focus
