# Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition

**Authors:** Yu Qian, Shucheng Huang, Kai Qu

PMC · DOI: 10.3390/e28020180 · Entropy · 2026-02-04

## TL;DR

This paper introduces a new deep learning framework for recognizing micro-expressions by disentangling facial identity and motion features in a learnable and dynamic way.

## Contribution

The novel LFD-TCMEN network uses end-to-end learnable disentanglement and synergistic optimization to improve micro-expression recognition.

## Key findings

- LFD-TCMEN achieves state-of-the-art cross-subject performance on the CAS(ME)3 and DFME benchmarks.
- The proposed framework effectively isolates subtle emotional motions from identity-specific features.
- Synergistic optimization of multiple loss functions enhances the discriminative power of the model.

## Abstract

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)3 and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization.

## Full-text entities

- **Diseases:** MER (MESH:C536681), CAS(ME)3 (MESH:D001072), facial muscle tension (MESH:D018781), injury to (MESH:D014947), muscle (MESH:D019042), ID (MESH:C537985)
- **Chemicals:** CAS(ME)3 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** CAS(ME)3 — Homo sapiens (Human), Glioblastoma, Cancer cell line (CVCL_1117)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939205/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939205/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939205/full.md

---
Source: https://tomesphere.com/paper/PMC12939205