# Speech, Head, and Eye-based Cues for Continuous Affect Prediction

**Authors:** Jonny O'Dwyer

arXiv: 1907.09919 · 2020-01-24

## TL;DR

This paper investigates the effectiveness of head, eye, and speech cues, including handcrafted and CNN-learned features, for improving continuous affect prediction in multimodal affective computing systems.

## Contribution

It comprehensively explores head and eye-based features, combined with speech, for continuous affect prediction, filling a gap in multimodal affective computing research.

## Key findings

- Head and eye cues significantly improve affect prediction accuracy.
- CNN-learned features outperform handcrafted features.
- Multimodal combination yields the best results.

## Abstract

Continuous affect prediction involves the discrete time-continuous regression of affect dimensions. Dimensions to be predicted often include arousal and valence. Continuous affect prediction researchers are now embracing multimodal model input. This provides motivation for researchers to investigate previously unexplored affective cues. Speech-based cues have traditionally received the most attention for affect prediction, however, non-verbal inputs have significant potential to increase the performance of affective computing systems and in addition, allow affect modelling in the absence of speech. However, non-verbal inputs that have received little attention for continuous affect prediction include eye and head-based cues. The eyes are involved in emotion displays and perception while head-based cues have been shown to contribute to emotion conveyance and perception. Additionally, these cues can be estimated non-invasively from video, using modern computer vision tools. This work exploits this gap by comprehensively investigating head and eye-based features and their combination with speech for continuous affect prediction. Hand-crafted, automatically generated and CNN-learned features from these modalities will be investigated for continuous affect prediction. The highest performing feature sets and feature set combinations will answer how effective these features are for the prediction of an individual's affective state.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.09919/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.09919/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1907.09919/full.md

---
Source: https://tomesphere.com/paper/1907.09919