FaceLiVTv2: An Improved Hybrid Architecture for Efficient Mobile Face Recognition

Novendra Setyawan; Chi-Chia Sun; Mao-Hsiu Hsu; Wen-Kai Kuo; Jun-Wei Hsieh

arXiv:2604.09127·cs.CV·May 7, 2026

FaceLiVTv2: An Improved Hybrid Architecture for Efficient Mobile Face Recognition

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

PDF

1 Repo

TL;DR

FaceLiVTv2 is a new hybrid architecture that improves mobile face recognition by enhancing efficiency and accuracy through lightweight global-local feature interaction modules, achieving significant latency reductions.

Contribution

The paper introduces Lite MHLA and a unified RepMix block, advancing hybrid CNN-Transformer models for efficient mobile face recognition with better performance and lower latency.

Findings

01

Reduces mobile inference latency by 22% compared to FaceLiVTv1.

02

Achieves up to 30.8% speedup over GhostFaceNets on mobile devices.

03

Maintains higher recognition accuracy while improving latency by 20-41%.

Abstract

Lightweight face recognition is increasingly important for deployment on edge and mobile devices, where strict constraints on latency, memory, and energy consumption must be met alongside reliable accuracy. Although recent hybrid CNN-Transformer architectures have advanced global context modeling, striking an effective balance between recognition performance and computational efficiency remains an open challenge. In this work, we present FaceLiVTv2, an improved version of our FaceLiVT hybrid architecture designed for efficient global--local feature interaction in mobile face recognition. At its core is Lite MHLA, a lightweight global token interaction module that replaces the original multi-layer attention design with multi-head linear token projections and affine rescale transformations, reducing redundancy while preserving representational diversity across heads. We further integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

novendrastywn/FaceLiVT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.