Shuffle Transformer with Feature Alignment for Video Face Parsing
Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu,, Bin Fu

TL;DR
This paper presents a Shuffle Transformer backbone combined with a Feature Alignment Aggregation module to improve face parsing accuracy in short videos, achieving first place in a CVPR challenge.
Contribution
The introduction of a cross-window Shuffle Transformer backbone and FAA module for enhanced face parsing accuracy and edge detail preservation.
Findings
Achieved 86.95% score in the PIC workshop challenge.
Ranked first in the Short-video Face Parsing track.
Demonstrated improved segmentation quality with the proposed method.
Abstract
This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021. In this paper, we introduce a strong backbone which is cross-window based Shuffle Transformer for presenting accurate face parsing representation. To further obtain the finer segmentation results, especially on the edges, we introduce a Feature Alignment Aggregation (FAA) module. It can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Benefiting from the stronger backbone and better feature aggregation, the proposed method achieves 86.9519% score in the Short-video Face Parsing track of the 3rd Person in Context (PIC) Workshop and Challenge, ranked the first place.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Video Surveillance and Tracking Methods · Face and Expression Recognition
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Multi-Head Attention · Label Smoothing · Residual Connection
