# Deep-Learning-Derived Facial Electromyogram Signatures of Emotion in Immersive Virtual Reality (bWell): Exploring the Impact of Emotional, Cognitive, and Physical Demands

**Authors:** Zohreh H. Meybodi, Francis Thibault, Budhachandra Khundrakpam, Gino De Luca, Jing Zhang, Joshua A. Granek, Nusrat Choudhury

PMC · DOI: 10.3390/s26061827 · Sensors (Basel, Switzerland) · 2026-03-13

## TL;DR

A deep learning model can accurately detect facial expressions in VR using electromyogram signals, capturing physical workload patterns across users.

## Contribution

A CNN–TCN model enables cross-participant facial expression classification in VR using raw fEMG signals, with potential for real-time, privacy-preserving monitoring.

## Key findings

- A CNN–TCN model achieved strong performance (Macro-F1 = 0.88 ± 0.13; ROC-AUC = 0.95 ± 0.06) in classifying four facial expressions in immersive VR.
- Model-derived features showed scene-dependent patterns primarily associated with perceived physical demand (NASA-TLX).
- The framework supports privacy-preserving, continuous expression monitoring without scene-specific retraining.

## Abstract

This study investigates the potential of using spatio-temporal deep learning to analyze facial electromyogram (fEMG) signals in immersive virtual reality (VR) environments. By examining the influence of emotional, cognitive, and physical demands, the research aims to capture distinct psychophysiological patterns and link them to nuanced and workload-related affective states in VR settings.

What are the main findings?
A CNN–TCN model trained on physiologically normalized multi-channel fEMG signals classified four calibrated facial expressions (smile, frown, raised eyebrow, neutral) in immersive VR, achieving strong leave-one-participant-out performance (Macro-F1 = 0.88 ± 0.13; ROC-AUC = 0.95 ± 0.06). This was achieved in a small study with only 12 participants, demonstrating the model’s potential and paving the way for further research with larger samples.When applied to unlabeled fEMG recordings from previously unseen VR scenes, the trained model generated continuous expression classes, from which static and temporal features showed scene-dependent patterns. These features showed significant associations primarily with perceived physical demand (NASA-TLX), suggesting effective capture of expressions related to physical effort, while associations with cognitive or emotional demand were less pronounced.

A CNN–TCN model trained on physiologically normalized multi-channel fEMG signals classified four calibrated facial expressions (smile, frown, raised eyebrow, neutral) in immersive VR, achieving strong leave-one-participant-out performance (Macro-F1 = 0.88 ± 0.13; ROC-AUC = 0.95 ± 0.06). This was achieved in a small study with only 12 participants, demonstrating the model’s potential and paving the way for further research with larger samples.

When applied to unlabeled fEMG recordings from previously unseen VR scenes, the trained model generated continuous expression classes, from which static and temporal features showed scene-dependent patterns. These features showed significant associations primarily with perceived physical demand (NASA-TLX), suggesting effective capture of expressions related to physical effort, while associations with cognitive or emotional demand were less pronounced.

What are the implications of the main findings?
End-to-end spatio-temporal modeling of raw fEMG enables privacy-preserving facial expression sensing in immersive VR without handcrafted feature engineering or scene-specific retraining, using a single physiologically normalized model shared across participants. This demonstrates the feasibility of expression monitoring suitable for automated and potentially real-time deployment.The convergence between model-derived expression dynamics and NASA-TLX workload ratings showcases the potential for reducing reliance on intermittent self-report measures in future VR applications. By bridging brief calibration-based learning with spontaneous task-elicited behavior, the framework supports continuous, physiologically grounded assessment that can complement, and in some contexts, partially substitute for, explicit questionnaires in training, performance evaluation, and user-experience research.

End-to-end spatio-temporal modeling of raw fEMG enables privacy-preserving facial expression sensing in immersive VR without handcrafted feature engineering or scene-specific retraining, using a single physiologically normalized model shared across participants. This demonstrates the feasibility of expression monitoring suitable for automated and potentially real-time deployment.

The convergence between model-derived expression dynamics and NASA-TLX workload ratings showcases the potential for reducing reliance on intermittent self-report measures in future VR applications. By bridging brief calibration-based learning with spontaneous task-elicited behavior, the framework supports continuous, physiologically grounded assessment that can complement, and in some contexts, partially substitute for, explicit questionnaires in training, performance evaluation, and user-experience research.

Emotional and workload-related states unfold dynamically during immersive virtual reality (VR) experiences, yet reliable physiological modeling in such environments remains challenging. We investigated whether multi-channel facial electromyography (fEMG), combined with spatio-temporal deep learning, can (i) accurately classify calibrated facial expressions across participants and (ii) transfer to spontaneous, task-elicited behavior in immersive VR. Twelve adults completed a calibration phase involving four intentional expressions (smile, frown, raised eyebrow, neutral), followed by VR scenes designed to elicit emotional, cognitive, physical, and dual task demands. After participant-level physiological normalization, a single shared Convolutional Neural Network–Temporal Convolutional Network (CNN–TCN) model was trained and evaluated using leave-one-participant-out (LOPO) validation. The model achieved strong cross-participant performance (Macro-F1 = 0.88 ± 0.13; ROC-AUC = 0.95 ± 0.06). When applied to unlabeled spontaneous VR task-elicited fEMG recordings, the trained model generated continuous expression classes. Derived static and temporal expression features showed scene-dependent modulation and False Discovery Rate (FDR)-surviving associations, primarily with perceived physical demand (NASA-TLX). The observed muscle activation patterns were physiologically plausible and aligned with Facial Action Coding System (FACS)-based interpretations of underlying muscle activity. These findings demonstrate that end-to-end spatio-temporal modeling of raw fEMG enables facial expression sensing in immersive VR using a single shared model following physiological normalization. The proposed framework bridges calibrated expression learning and spontaneous task-elicited behavior, supporting privacy-preserving, continuous and physiologically grounded monitoring in human-centered VR applications.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030317/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030317/full.md

## References

75 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030317/full.md

---
Source: https://tomesphere.com/paper/PMC13030317