The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games
Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis

TL;DR
This paper investigates whether deep learning models can predict player arousal from audiovisual game footage alone, demonstrating high accuracy across diverse games and highlighting the potential for general-purpose affective representations.
Contribution
It introduces a method for predicting affective arousal from audiovisual data in games, showing that general-purpose models can be effective across different game types.
Findings
Achieved up to 85% accuracy in arousal prediction.
Models generalize across dissimilar games.
Audiovisual features contain sufficient information for affect detection.
Abstract
What if emotion could be captured in a general and subject-agnostic fashion? Is it possible, for instance, to design general-purpose representations that detect affect solely from the pixels and audio of a human-computer interaction video? In this paper we address the above questions by evaluating the capacity of deep learned representations to predict affect by relying only on audiovisual information of videos. We assume that the pixels and audio of an interactive session embed the necessary information required to detect affect. We test our hypothesis in the domain of digital games and evaluate the degree to which deep classifiers and deep preference learning algorithms can learn to predict the arousal of players based only on the video footage of their gameplay. Our results from four dissimilar games suggest that general-purpose representations can be built across games as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
