Data Augmentation for Human Behavior Analysis in Multi-Person   Conversations

Kun Li; Dan Guo; Guoliang Chen; Feiyang Liu; Meng Wang

arXiv:2308.01526·cs.CV·August 4, 2023

Data Augmentation for Human Behavior Analysis in Multi-Person Conversations

Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang

PDF

TL;DR

This paper presents a data augmentation approach using Swin Transformer for multi-person conversation analysis, improving accuracy in behavior recognition, eye contact detection, and speaker prediction.

Contribution

The paper introduces a data augmentation strategy combined with Swin Transformer to enhance multi-person conversation analysis tasks.

Findings

01

Achieved 0.6262 mean average precision in bodily behavior recognition.

02

Attained 0.7771 accuracy in eye contact detection.

03

Reached 0.5281 unweighted average recall in next speaker prediction.

Abstract

In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023. The solution covers three sub-challenges: bodily behavior recognition, eye contact detection, and next speaker prediction. We select Swin Transformer as the baseline and exploit data augmentation strategies to address the above three tasks. Specifically, we crop the raw video to remove the noise from other parts. At the same time, we utilize data augmentation to improve the generalization of the model. As a result, our solution achieves the best results of 0.6262 for bodily behavior recognition in terms of mean average precision and the accuracy of 0.7771 for eye contact detection on the corresponding test set. In addition, our approach also achieves comparable results of 0.5281 for the next speaker prediction in terms of unweighted average recall.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Stochastic Depth · Linear Layer · Dense Connections · Adam · Label Smoothing · Dropout · Absolute Position Encodings