The Latency Wall: Benchmarking Off-the-Shelf Emotion Recognition for Real-Time Virtual Avatars
Yarin Benyamin

TL;DR
This paper benchmarks state-of-the-art off-the-shelf models for real-time facial expression recognition in virtual avatars, revealing a latency-accuracy trade-off and emphasizing the need for lightweight, domain-specific architectures for VR therapy.
Contribution
It evaluates existing models' performance on VR avatars, identifying the latency wall and highlighting the gap in real-time, accurate emotion recognition for accessible VR therapy.
Findings
Face detection on avatars is robust with 100% accuracy.
YOLOv11n balances detection speed (~54 ms) and accuracy.
General-purpose Transformers underperform in speed and accuracy for real-time use.
Abstract
In the realm of Virtual Reality (VR) and Human-Computer Interaction (HCI), real-time emotion recognition shows promise for supporting individuals with Autism Spectrum Disorder (ASD) in improving social skills. This task requires a strict latency-accuracy trade-off, with motion-to-photon (MTP) latency kept below 140 ms to maintain contingency. However, most off-the-shelf Deep Learning models prioritize accuracy over the strict timing constraints of commodity hardware. As a first step toward accessible VR therapy, we benchmark State-of-the-Art (SOTA) models for Zero-Shot Facial Expression Recognition (FER) on virtual characters using the UIBVFED dataset. We evaluate Medium and Nano variants of YOLO (v8, v11, and v12) for face detection, alongside general-purpose Vision Transformers including CLIP, SigLIP, and ViT-FER.Our results on CPU-only inference demonstrate that while face detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Autism Spectrum Disorder Research · Face Recognition and Perception
