The Social Gaze of LLMs: A Literature Review of Multimodal Approaches to Human Behavior Understanding

Zihan Liu; Parisa Rabbani; Veda Duddu; Kyle Fan; Madison Lee; Yun Huang

arXiv:2510.23947·cs.HC·November 19, 2025

The Social Gaze of LLMs: A Literature Review of Multimodal Approaches to Human Behavior Understanding

Zihan Liu, Parisa Rabbani, Veda Duddu, Kyle Fan, Madison Lee, Yun Huang

PDF

TL;DR

This literature review analyzes how multimodal large language models interpret human behavior, highlighting current practices, limitations, and ethical considerations, and proposes a research agenda for more socially competent systems.

Contribution

It systematically reviews 176 studies, identifying gaps in adaptive reasoning, evaluation methods, and ethical focus, and suggests directions for future research in socially aware multimodal systems.

Findings

01

Predominant use of pattern recognition and information extraction.

02

Limited support for adaptive, interactive reasoning.

03

Evaluation mainly relies on static benchmarks, with few human-centered assessments.

Abstract

LLM-powered multimodal systems are increasingly used to interpret human behavior, yet how researchers apply the models' 'social competence' remains poorly understood. This paper presents a systematic literature review of 176 publications across different application domains (e.g., healthcare, education, and entertainment). Using a four-dimensional coding framework (application, technical, evaluative, and ethical), we find (1) frequent use of pattern recognition and information extraction from multimodal sources, but limited support for adaptive, interactive reasoning; (2) a dominant 'modality-to-text' pipeline that privileges language over rich audiovisual cues, striping away nuanced social cues; (3) evaluation practices reliant on static benchmarks, with socially grounded, human-centered assessments rare; and (4) Ethical discussions focused mainly on legal and rights-related risks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.