TL;DR
ATTN-FIQA introduces a training-free, attention-based face image quality assessment method leveraging pre-trained Vision Transformers, providing interpretable quality scores without additional training or modifications.
Contribution
It demonstrates that pre-softmax attention scores from Vision Transformers can serve as effective, interpretable face image quality indicators without extra training.
Findings
Attention scores correlate with face image quality across benchmarks.
The method requires only a single forward pass, no retraining.
Attention patterns reveal facial regions influencing quality assessment.
Abstract
Face Image Quality Assessment (FIQA) aims to assess the recognition utility of face samples and is essential for reliable face recognition (FR) systems. Existing approaches require computationally expensive procedures such as multiple forward passes, backpropagation, or additional training, and only recent work has focused on the use of Vision Transformers. Recent studies highlighted that these architectures inherently function as saliency learners with attention patterns naturally encoding spatial importance. This work proposes ATTN-FIQA, a novel training-free approach that investigates whether pre-softmax attention scores from pre-trained Vision Transformer-based face recognition models can serve as quality indicators. We hypothesize that attention magnitudes intrinsically encode quality: high-quality images with discriminative facial features enable strong query-key alignments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
