CLIP-AUTT: Test-Time Personalization with Action Unit Prompting for Fine-Grained Video Emotion Recognition

Muhammad Osama Zeeshan; Masoumeh Sharafi; Beno\^it Savary; Alessandro Lameiras Koerich; Marco Pedersoli; and Eric Granger

arXiv:2603.27999·cs.CV·April 1, 2026

CLIP-AUTT: Test-Time Personalization with Action Unit Prompting for Fine-Grained Video Emotion Recognition

Muhammad Osama Zeeshan, Masoumeh Sharafi, Beno\^it Savary, Alessandro Lameiras Koerich, Marco Pedersoli, and Eric Granger

PDF

1 Repo

TL;DR

This paper introduces CLIP-AUTT, a method that personalizes fine-grained video emotion recognition by dynamically adapting Action Unit prompts at test time, improving accuracy and robustness.

Contribution

It proposes a novel test-time personalization approach using AU prompts within CLIP, enhancing subject-specific emotion recognition without retraining the model.

Findings

01

CLIP-AUTT outperforms state-of-the-art methods on three datasets.

02

The approach effectively adapts to unseen subjects in video emotion recognition.

03

AU-based prompts improve interpretability and fine-grained recognition.

Abstract

Personalization in emotion recognition (ER) is essential for an accurate interpretation of subtle and subject-specific expressive patterns. Recent advances in vision-language models (VLMs) such as CLIP demonstrate strong potential for leveraging joint image-text representations in ER. However, CLIP-based methods either depend on CLIP's contrastive pretraining or on LLMs to generate descriptive text prompts, which are noisy, computationally expensive, and fail to capture fine-grained expressions, leading to degraded performance. In this work, we leverage Action Units (AUs) as structured textual prompts within CLIP to model fine-grained facial expressions. AUs encode the subtle muscle activations underlying expressions, providing localized and interpretable semantic cues for more robust ER. We introduce CLIP-AU, a lightweight AU-guided temporal learning method that integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

osamazeeshan/CLIP-AUTT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.