From Multimodal Signals to Adaptive XR Experiences for De-escalation Training
Birgit Nierula, Karam Tomotaki-Dawoud, Daniel Johannes Meyer, Iryna Ignatieva, Mina Mottahedin, Thomas Koch, Sebastian Bosse

TL;DR
This paper describes a multimodal, real-time communication analysis system designed for adaptive VR de-escalation training, integrating various sensors and analysis streams to assess user cues and inform interaction responses.
Contribution
It introduces a novel multimodal, synchronized analysis framework for XR training, grounded in social semiotics, with preliminary results demonstrating its feasibility in law enforcement scenarios.
Findings
Multi-view sensing improves occlusion handling.
Fusion of signals enhances emotion recognition accuracy.
Preliminary results support system feasibility in real-world training.
Abstract
We present the early-stage design and implementation of a multimodal, real-time communication analysis system intended as a foundational interaction layer for adaptive VR training. The system integrates five parallel processing streams: (1) verbal and prosodic speech analysis, (2) skeletal gesture recognition from multi-view RGB cameras, (3) multimodal affective analysis combining lower-face video with upper-face facial EMG, (4) EEG-based mental state decoding, and (5) physiological arousal estimation from skin conductance, heart activity, and proxemic behavior. All signals are synchronized via Lab Streaming Layer to enable temporally aligned, continuous assessments of users' conscious and unconscious communication cues. Building on concepts from social semiotics and symbolic interactionism, we introduce an interpretation layer that links low-level signal representations to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
