SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research   in the Wild

Jean Kossaifi; Robert Walecki; Yannis Panagakis; Jie Shen; Maximilian; Schmitt; Fabien Ringeval; Jing Han; Vedhas Pandit; Antoine Toisoul; Bjorn; Schuller; Kam Star; Elnar Hajiyev; Maja Pantic

arXiv:1901.02839·cs.HC·November 20, 2019

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild

Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian, Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjorn, Schuller, Kam Star, Elnar Hajiyev, Maja Pantic

PDF

1 Repo

TL;DR

The SEWA database provides a comprehensive, culturally diverse, and richly annotated audio-visual dataset for advancing in-the-wild emotion and sentiment analysis in human-computer interaction.

Contribution

This paper introduces the SEWA database, a large-scale, multi-cultural, and richly annotated audio-visual dataset for emotion and sentiment research in real-world settings.

Findings

01

Rich annotations including facial landmarks, FAUs, vocalisations, and continuous affect measures.

02

Baseline experiments demonstrate the dataset's utility for automatic FAU detection and affect estimation.

03

Diverse cultural and contextual data enhance robustness of emotion recognition models.

Abstract

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pwc-1/Paper-10/tree/main/sew_d
mindspore

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.