Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Lifeng Fan; Shuwen Qiu; Zilong Zheng; Tao Gao; Song-Chun Zhu; Yixin; Zhu

arXiv:2104.02841·cs.CV·April 8, 2021

Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin, Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel model that captures nonverbal cues and belief dynamics among agents in videos, enabling better understanding of social interactions and improved video summarization.

Contribution

It presents a new hierarchical energy-based model that infers agents' beliefs and true states, forming a 'common mind' from nonverbal cues, advancing scene understanding in social contexts.

Findings

01

Improved video summarization on social interaction videos

02

Effective modeling of belief dynamics and nonverbal cues

03

Outperforms state-of-the-art keyframe methods

Abstract

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In contrast, such crucial social characteristics are mostly missing in the existing scene understanding literature. In this paper, we incorporate different nonverbal communication cues (e.g., gaze, human poses, and gestures) to represent, model, learn, and infer agents' mental states from pure visual inputs. Crucially, such a mental representation takes the agent's belief into account so that it represents what the true world state is and infers the beliefs in each agent's mental state, which may differ from the true world states. By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents. This "five minds" model differs from prior works that infer beliefs in an infinite recursion;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LifengFan/Triadic-Belief-Dynamics
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition