FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device

Novendra Setyawan; Chi-Chia Sun; Mao-Hsiu Hsu; Wen-Kai Kuo; Jun-Wei Hsieh

arXiv:2506.10361·cs.CV·December 8, 2025

FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

PDF

TL;DR

FaceLiVT is a lightweight face recognition model combining CNN and Transformer architectures with a novel attention mechanism, achieving high accuracy and speed on mobile devices.

Contribution

The paper introduces FaceLiVT, a new hybrid CNN-Transformer model with a lightweight Multi-Head Linear Attention mechanism and reparameterized token mixer for efficient face recognition.

Findings

01

FaceLiVT outperforms state-of-the-art lightweight models on multiple benchmarks.

02

It achieves 8.6x faster inference than EdgeFace.

03

It is 21.2x faster than pure ViT models, with competitive accuracy.

Abstract

This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Extensive evaluations on challenging benchmarks; including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C; highlight its superior performance compared to state-of-the-art lightweight models. MHLA notably improves inference speed, allowing FaceLiVT to deliver high accuracy with lower latency on mobile devices. Specifically, FaceLiVT is 8.6 faster than EdgeFace, a recent hybrid CNN-Transformer model optimized for edge devices, and 21.2 faster than a pure ViT-Based model. With its balanced design, FaceLiVT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Linear Layer · Multi-Head Linear Attention · Convolution