A short review and primer on the use of human voice in human computer interaction applications
Kristian Lukander

TL;DR
This paper provides a comprehensive primer on using human voice features, such as intonation and prosody, for non-intrusive emotion and state recognition in everyday human-computer interaction applications.
Contribution
It offers an accessible overview of core concepts and recent advances in speech-based psychophysiological analysis tailored for HCI applications.
Findings
Speech features can reveal emotional states and cognitive load.
Voice analysis offers a promising non-intrusive method for context understanding.
The paper emphasizes practical application guidelines for HCI.
Abstract
The application of psychophysiologicy in human-computer interaction is a growing field with significant potential for future smart personalised systems. Working in this emerging field requires comprehension of an array of physiological signals and analysis techniques. Human speech affords, alongside linguistic content, rich information in the intonation, voice quality, prosody, and rhythmic variation of utterances, allowing listeners to recognise numerous distinct emotional states in the speaker. Several types of factors affect speech, ranging from emotions to cognitive load and pathological conditions, providing a promising non-intrusive source for online understanding of context and psychophysiological state. This paper aims to serve as a primer for the novice, enabling rapid familiarisation with the latest core concepts. We put special emphasis on everyday human-computer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Emotion and Mood Recognition · Speech and dialogue systems
