Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic   Interactions

David Curto; Albert Clap\'es; Javier Selva; Sorina Smeureanu; Julio C.; S. Jacques Junior; David Gallardo-Pujol; Georgina Guilera; David Leiva,; Thomas B. Moeslund; Sergio Escalera; Cristina Palmero

arXiv:2109.09487·cs.CV·September 21, 2021

Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

David Curto, Albert Clap\'es, Javier Selva, Sorina Smeureanu, Julio C., S. Jacques Junior, David Gallardo-Pujol, Georgina Guilera, David Leiva,, Thomas B. Moeslund, Sergio Escalera, Cristina Palmero

PDF

TL;DR

Dyadformer is a novel multi-modal Transformer architecture designed to model long-range dyadic interactions, capturing individual and interpersonal features over extended periods to improve personality inference accuracy.

Contribution

It introduces a multi-modal, multi-subject Transformer with a cross-subject layer for explicit interaction modeling, enabling long-term analysis of dyadic interactions.

Findings

01

Improves state-of-the-art personality inference results on UDIVA v0.5 dataset.

02

Effectively models long-term interdependencies in dyadic interactions.

03

Utilizes multi-modality and joint modeling for enhanced prediction.

Abstract

Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to model individual and interpersonal features in dyadic interactions using variable time windows, thus allowing the capture of long-term interdependencies. Our proposed cross-subject layer allows the network to explicitly model interactions among subjects through attentional operations. This proof-of-concept approach shows how multi-modality and joint modeling of both interactants for longer periods of time helps to predict individual attributes. With Dyadformer, we improve state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Label Smoothing · Multi-Head Attention · Byte Pair Encoding · Softmax