Data Augmentation for Human Behavior Analysis in Multi-Person Conversations
Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang

TL;DR
This paper presents a data augmentation approach using Swin Transformer for multi-person conversation analysis, improving accuracy in behavior recognition, eye contact detection, and speaker prediction.
Contribution
The paper introduces a data augmentation strategy combined with Swin Transformer to enhance multi-person conversation analysis tasks.
Findings
Achieved 0.6262 mean average precision in bodily behavior recognition.
Attained 0.7771 accuracy in eye contact detection.
Reached 0.5281 unweighted average recall in next speaker prediction.
Abstract
In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023. The solution covers three sub-challenges: bodily behavior recognition, eye contact detection, and next speaker prediction. We select Swin Transformer as the baseline and exploit data augmentation strategies to address the above three tasks. Specifically, we crop the raw video to remove the noise from other parts. At the same time, we utilize data augmentation to improve the generalization of the model. As a result, our solution achieves the best results of 0.6262 for bodily behavior recognition in terms of mean average precision and the accuracy of 0.7771 for eye contact detection on the corresponding test set. In addition, our approach also achieves comparable results of 0.5281 for the next speaker prediction in terms of unweighted average recall.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Stochastic Depth · Linear Layer · Dense Connections · Adam · Label Smoothing · Dropout · Absolute Position Encodings
