Narrative Review of Emotional Expression Support in XR: Psychophysiology of Speech-to-Text Interfaces
Sunday David Ubur, Denis Gracanin

TL;DR
This review analyzes recent progress and gaps in integrating emotional expression into speech-to-text interfaces in XR, highlighting emerging techniques and the need for real-time emotion-aware captioning to enhance immersive communication.
Contribution
It synthesizes current research on affective speech-to-text systems in XR, identifying key gaps and proposing future directions for emotion-responsive captioning interfaces.
Findings
Real-time STT tools lack affective nuance
Emerging approaches include animated captions and avatar visualization
Persistent gap in emotion-aware captioning in XR environments
Abstract
This narrative review examines recent advancements, limitations, and research gaps in integrating emotional expression into speech-to-text (STT) interfaces within extended reality (XR) environments. Drawing from 37 peer-reviewed studies published between 2020 and 2024, we synthesized literature across multiple domains, including affective computing, psychophysiology, captioning innovation, and immersive human-computer interaction. Thematic categories include communication enhancement technologies for Deaf and Hard of Hearing (DHH) users, emotive captioning strategies, visual and affective augmentation in AR/VR, speech emotion recognition, and the development of empathic systems. Despite the growing accessibility of real-time STT tools, such systems largely fail to convey affective nuance, limiting the richness of communication for DHH users and other caption consumers. This review…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVirtual Reality Applications and Impacts
