# MESTI-MEGANet: Micro-expression Spatio-Temporal Image and Micro-expression Gradient Attention Networks for Micro-expression Recognition

**Authors:** Luu Tu Nguyen, Vu Tram Anh Khuong, Thanh Ha Le, Thi Duyen Ngo

arXiv: 2509.00056 · 2025-09-09

## TL;DR

This paper introduces MESTI, a novel image modality, and MEGANet, an attention-based network, achieving state-of-the-art micro-expression recognition performance by effectively capturing subtle facial movements.

## Contribution

The study presents MESTI and MEGANet, combining a new input modality with an attention network to significantly improve micro-expression recognition accuracy.

## Key findings

- MESTI outperforms existing input modalities across CNN architectures.
- Replacing inputs with MESTI improves existing MER networks.
- MEGANet achieves state-of-the-art results on CASMEII and SAMM datasets.

## Abstract

Micro-expression recognition (MER) is a challenging task due to the subtle and fleeting nature of micro-expressions. Traditional input modalities, such as Apex Frame, Optical Flow, and Dynamic Image, often fail to adequately capture these brief facial movements, resulting in suboptimal performance. In this study, we introduce the Micro-expression Spatio-Temporal Image (MESTI), a novel dynamic input modality that transforms a video sequence into a single image while preserving the essential characteristics of micro-movements. Additionally, we present the Micro-expression Gradient Attention Network (MEGANet), which incorporates a novel Gradient Attention block to enhance the extraction of fine-grained motion features from micro-expressions. By combining MESTI and MEGANet, we aim to establish a more effective approach to MER. Extensive experiments were conducted to evaluate the effectiveness of MESTI, comparing it with existing input modalities across three CNN architectures (VGG19, ResNet50, and EfficientNetB0). Moreover, we demonstrate that replacing the input of previously published MER networks with MESTI leads to consistent performance improvements. The performance of MEGANet, both with MESTI and Dynamic Image, is also evaluated, showing that our proposed network achieves state-of-the-art results on the CASMEII and SAMM datasets. The combination of MEGANet and MESTI achieves the highest accuracy reported to date, setting a new benchmark for micro-expression recognition. These findings underscore the potential of MESTI as a superior input modality and MEGANet as an advanced recognition network, paving the way for more effective MER systems in a variety of applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00056/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00056/full.md

---
Source: https://tomesphere.com/paper/2509.00056