TL;DR
This paper introduces an end-to-end deep learning framework that jointly estimates micro-expressions, optical flow, and facial landmarks using transformer-graph convolution, improving accuracy on multiple benchmarks.
Contribution
It proposes a novel F5C block combining fully-connected and channel correspondence convolutions, enabling direct local-global feature extraction without key frame prior knowledge.
Findings
Outperforms state-of-the-art MER methods on CASME II, SAMM, and SMIC datasets.
Effectively estimates optical flow and detects facial landmarks.
Captures subtle facial muscle actions related to micro-expressions.
Abstract
Facial micro-expression recognition (MER) is a challenging problem, due to transient and subtle micro-expression (ME) actions. Most existing methods depend on hand-crafted features, key frames like onset, apex, and offset frames, or deep networks limited by small-scale and low-diversity datasets. In this paper, we propose an end-to-end micro-action-aware deep learning framework with advantages from transformer, graph convolution, and vanilla convolution. In particular, we propose a novel F5C block composed of fully-connected convolution and channel correspondence convolution to directly extract local-global features from a sequence of raw frames, without the prior knowledge of key frames. The transformer-style fully-connected convolution is proposed to extract local features while maintaining global receptive fields, and the graph-style channel correspondence convolution is introduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
