# Emotion Recognition from rPPG via Physiologically Inspired Temporal Encoding and Attention-Based Curriculum Learning

**Authors:** Changmin Lee, Hyunwoo Lee, Mincheol Whang

PMC · DOI: 10.3390/s25133995 · Sensors (Basel, Switzerland) · 2025-06-26

## TL;DR

This paper introduces a new framework for emotion recognition using rPPG signals, achieving strong results for arousal but showing limitations in valence recognition.

## Contribution

A physiologically inspired deep learning framework for rPPG-based emotion recognition with novel components like MTDE, sparse α-Entmax attention, and curriculum learning.

## Key findings

- The model achieved 66.04% accuracy for arousal recognition on the MAHNOB-HCI dataset.
- Valence recognition showed lower performance (62.26% accuracy), indicating limitations in unimodal temporal analysis.
- The framework outperformed prior CNN-LSTM baselines in arousal recognition.

## Abstract

What are the main findings?
A temporal-only rPPG framework with a multi-scale CNN, sparse α-Entmax attention, and Gated Pooling achieved 66.04% accuracy and a 61.97% weighted F1-score for arousal on MAHNOB-HCI (mixed subjects).The model underperformed for valence (62.26% accuracy), highlighting the physiological limits of unimodal time-series signals.

A temporal-only rPPG framework with a multi-scale CNN, sparse α-Entmax attention, and Gated Pooling achieved 66.04% accuracy and a 61.97% weighted F1-score for arousal on MAHNOB-HCI (mixed subjects).

The model underperformed for valence (62.26% accuracy), highlighting the physiological limits of unimodal time-series signals.

What are the implications of the main findings?
Temporal rPPG can rival other single-modality methods for arousal when physiologically inspired temporal modeling is applied.Addressing valence requires the integration of spatial or multimodal cues, guiding future affective computing designs.

Temporal rPPG can rival other single-modality methods for arousal when physiologically inspired temporal modeling is applied.

Addressing valence requires the integration of spatial or multimodal cues, guiding future affective computing designs.

Remote photoplethysmography (rPPG) enables non-contact physiological measurement for emotion recognition, yet the temporally sparse nature of emotional cardiovascular responses, intrinsic measurement noise, weak session-level labels, and subtle correlates of valence pose critical challenges. To address these issues, we propose a physiologically inspired deep learning framework comprising a Multi-scale Temporal Dynamics Encoder (MTDE) to capture autonomic nervous system dynamics across multiple timescales, an adaptive sparse α-Entmax attention mechanism to identify salient emotional segments amidst noisy signals, Gated Temporal Pooling for the robust aggregation of emotional features, and a structured three-phase curriculum learning strategy to systematically handle temporal sparsity, weak labels, and noise. Evaluated on the MAHNOB-HCI dataset (27 subjects and 527 sessions with a subject-mixed split), our temporal-only model achieved competitive performance in arousal recognition (66.04% accuracy; 61.97% weighted F1-score), surpassing prior CNN-LSTM baselines. However, lower performance in valence (62.26% accuracy) revealed inherent physiological limitations regarding a unimodal temporal cardiovascular analysis. These findings establish clear benchmarks for temporal-only rPPG emotion recognition and underscore the necessity of incorporating spatial or multimodal information to effectively capture nuanced emotional dimensions such as valence, guiding future research directions in affective computing.

## Full-text entities

- **Diseases:** Sparse Attention (MESH:C536116), HCI (MESH:C535860), injury to (MESH:D014947)
- **Chemicals:** TCN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12251639/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12251639/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC12251639/full.md

---
Source: https://tomesphere.com/paper/PMC12251639