Simple 3D Pose Features Support Human and Machine Social Scene Understanding

Wenshuo Qin; Leyla Isik

arXiv:2511.03988·cs.CV·February 23, 2026

Simple 3D Pose Features Support Human and Machine Social Scene Understanding

Wenshuo Qin, Leyla Isik

PDF

Open Access

TL;DR

This paper shows that simple 3D pose features are crucial for understanding social interactions in videos, outperforming many deep neural networks and aligning closely with human judgments.

Contribution

It introduces a novel 3D pose extraction pipeline and demonstrates that minimal 3D features are sufficient for social scene understanding, surpassing existing DNNs.

Findings

01

3D body joints predict social judgments better than most DNNs.

02

Minimal 3D features explain the prediction performance of full body joint sets.

03

3D pose features improve DNN alignment with human social judgments.

Abstract

Humans effortlessly recognize social interactions from visual input, yet the underlying computations remain unknown, and social interaction recognition challenges even the most advanced deep neural networks (DNNs). Here, we hypothesized that humans rely on 3D visuospatial pose information to make social judgments, and that this information is largely absent from most vision DNNs. To test these hypotheses, we used a novel pose and depth estimation pipeline to automatically extract 3D body joint positions from short video clips. We compared the ability of these body joints to predict human social judgments in the videos with embeddings from over 350 vision DNNs. We found that body joints predicted social judgments better than most DNNs. We then reduced the 3D body joints to an even more compact feature set describing only the 3D position and direction of people in the videos. We found…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Action Observation and Synchronization · Emotion and Mood Recognition