# Real-Time Emotion Recognition Performance of Mobile Devices: A Detailed Analysis of Camera and TrueDepth Sensors Using Apple’s ARKit

**Authors:** Céline Madeleine Aldenhoven, Leon Nissen, Marie Heinemann, Cem Doğdu, Alexander Hanke, Stephan Jonas, Lara Marie Reimer

PMC · DOI: 10.3390/s26031060 · Sensors (Basel, Switzerland) · 2026-02-06

## TL;DR

This study shows that Apple's ARKit on an iPhone can recognize emotions in real time with accuracy comparable to humans, using only the device's camera and sensors.

## Contribution

A novel real-time emotion recognition method using ARKit blend shapes and cosine similarity on mobile devices is introduced.

## Key findings

- Cosine similarity achieved 68.3% accuracy, outperforming human raters by 9.4 percentage points.
- The method achieved AUCs of ≥0.84 for all emotion classes.
- The approach runs in real time on-device with minimal compute and preserves privacy.

## Abstract

Facial features hold information about a person’s emotions, motor function, or genetic defects. Since most current mobile devices are capable of real-time face detection using cameras and depth sensors, real-time facial analysis can be utilized in several mobile use cases. Understanding the real-time emotion recognition capabilities of device sensors and frameworks is vital for developing new, valid applications. Therefore, we evaluated on-device emotion recognition using Apple’s ARKit on an iPhone 14 Pro. A native app elicited 36 blend shape-specific movements and 7 discrete emotions from N=31 healthy adults. Per frame, standardized ARKit blend shapes were classified using a prototype-based cosine similarity metric; performance was summarized as accuracy and area under the receiver operating characteristic curves. Cosine similarity achieved an overall accuracy of 68.3%, exceeding the mean of three human raters (58.9%; +9.4 percentage points, ≈16% relative). Per-emotion accuracy was highest for joy, fear, sadness, and surprise, and competitive for anger, disgust, and contempt. AUCs were ≥0.84 for all classes. The method runs in real time on-device using only vector operations, preserving privacy and minimizing compute. These results indicate that a simple, interpretable cosine-similarity classifier over ARKit blend shapes delivers human-comparable, real-time facial emotion recognition on commodity hardware, supporting privacy-preserving mobile applications.

## Full-text entities

- **Diseases:** genetic defects (MESH:D030342)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899966/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899966/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899966/full.md

---
Source: https://tomesphere.com/paper/PMC12899966