Shuffle Transformer with Feature Alignment for Video Face Parsing

Rui Zhang; Yang Han; Zilong Huang; Pei Cheng; Guozhong Luo; Gang Yu,; Bin Fu

arXiv:2106.08650·cs.CV·June 17, 2021·1 cites

Shuffle Transformer with Feature Alignment for Video Face Parsing

Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu,, Bin Fu

PDF

Open Access

TL;DR

This paper presents a Shuffle Transformer backbone combined with a Feature Alignment Aggregation module to improve face parsing accuracy in short videos, achieving first place in a CVPR challenge.

Contribution

The introduction of a cross-window Shuffle Transformer backbone and FAA module for enhanced face parsing accuracy and edge detail preservation.

Findings

01

Achieved 86.95% score in the PIC workshop challenge.

02

Ranked first in the Short-video Face Parsing track.

03

Demonstrated improved segmentation quality with the proposed method.

Abstract

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021. In this paper, we introduce a strong backbone which is cross-window based Shuffle Transformer for presenting accurate face parsing representation. To further obtain the finer segmentation results, especially on the edges, we introduce a Feature Alignment Aggregation (FAA) module. It can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Benefiting from the stronger backbone and better feature aggregation, the proposed method achieves 86.9519% score in the Short-video Face Parsing track of the 3rd Person in Context (PIC) Workshop and Challenge, ranked the first place.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Video Surveillance and Tracking Methods · Face and Expression Recognition

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Multi-Head Attention · Label Smoothing · Residual Connection