Attention-Aware Transformer-Based Aggregation Network for Video Periocular Recognition
Luiz G F Carreira, Breno A Mariano, Victor H C de Melo, David Menotti, William Robson Schwartz

TL;DR
This paper introduces an attention-aware transformer-based network for video periocular recognition, improving robustness and accuracy in surveillance scenarios by adaptively aggregating frame features.
Contribution
It proposes a novel transformer-based aggregation framework that enhances video periocular recognition performance over traditional methods.
Findings
Achieves 99.8% [email protected] false positive rate on COX Face dataset.
Outperforms naive aggregation schemes consistently.
Demonstrates robustness in unconstrained surveillance environments.
Abstract
Video periocular recognition is the task of recognizing an individual's identity based on the region around an individual's eyes. The periocular area is one of the most discriminative regions of the human face, making it suitable for recognition tasks. Its use as a biometric modality has emerged as an alternative, especially in surveillance scenarios where conventional biometric traits such as face or iris recognition become unfeasible due to unconstrained acquisition conditions. This paper proposes an attention-aware approach for video-based periocular recognition in surveillance environments. The framework consists of two main modules: feature embedding and aggregation. The feature embedding module is a deep convolutional neural network that maps periocular data to feature vectors. The aggregation module is an encoder-only transformer that adaptively learns to aggregate frame-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
