M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention
Yiming Tang, Abrar Anwar, Jesse Thomason

TL;DR
This paper introduces M3PT, a causal transformer model designed to predict multimodal social signals in multi-party conversations, improving accuracy by integrating multiple social cues across participants and time.
Contribution
M3PT is the first unified transformer model that captures multimodal, multi-party social signals with modality and temporal attention masking.
Findings
Multimodal input improves bite timing prediction.
M3PT outperforms prior models on social signal prediction.
Using multiple modalities enhances speaking status accuracy.
Abstract
Understanding social signals in multi-party conversations is important for human-robot interaction and artificial social intelligence. Social signals include body pose, head pose, speech, and context-specific activities like acquiring and taking bites of food when dining. Past work in multi-party interaction tends to build task-specific models for predicting social signals. In this work, we address the challenge of predicting multimodal social signals in multi-party settings in a single model. We introduce M3PT, a causal transformer architecture with modality and temporal blockwise attention masking to simultaneously process multiple social cues across multiple participants and their temporal interactions. We train and evaluate M3PT on the Human-Human Commensality Dataset (HHCD), and demonstrate that using multiple modalities improves bite timing and speaking status prediction. Source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Emotion and Mood Recognition · Time Series Analysis and Forecasting
MethodsSoftmax · Attention Is All You Need
