Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos
Masoumeh Sharafi, Muhammad Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, and Eric Granger

TL;DR
This paper proposes TTA-CaP, a cache-based, cost-effective test-time adaptation method for facial expression recognition in videos that improves performance under subject and environment shifts without high computational costs.
Contribution
Introducing TTA-CaP, a novel cache personalization approach that uses multiple caches and a tri-gate mechanism for stable, low-overhead test-time adaptation of vision-language models in video FER.
Findings
Outperforms state-of-the-art TTA methods on three FER datasets.
Maintains low computational and memory overhead.
Effective under subject-specific and environmental shifts.
Abstract
Facial expression recognition (FER) in videos requires model personalization to capture the considerable variations across subjects. Vision-language models (VLMs) offer strong transfer to downstream tasks through image-text alignment, but their performance can still degrade under inter-subject distribution shifts. Personalizing models using test-time adaptation (TTA) methods can mitigate this challenge. However, most state-of-the-art TTA methods rely on unsupervised parameter optimization, introducing computational overhead that is impractical in many real-world applications. This paper introduces TTA through Cache Personalization (TTA-CaP), a cache-based TTA method that enables cost-effective (gradient-free) personalization of VLMs for video FER. Prior cache-based TTA methods rely solely on dynamic memories that store test samples, which can accumulate errors and drift due to noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face recognition and analysis · Human Pose and Action Recognition
