Fully-attentive and interpretable: vision and video vision transformers for pain detection
Giacomo Fiorentini, Itir Onal Ertugrul, Albert Ali Salah

TL;DR
This paper introduces a fully-attentive vision transformer-based pipeline for automated pain detection from facial expressions, achieving state-of-the-art results and providing interpretable attention maps.
Contribution
It is the first to apply fully-attentive vision transformers to pain detection, demonstrating improved performance and interpretability over previous methods.
Findings
Models outperform earlier works in pain detection accuracy.
Attention maps provide reasonable interpretations of model predictions.
Hyperparameter analysis identified key configurations for optimal performance.
Abstract
Pain is a serious and costly issue globally, but to be treated, it must first be detected. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain detection. In this paper, we propose the first fully-attentive automated pain detection pipeline that achieves state-of-the-art performance on binary pain detection from facial expressions. The model is trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining 3 noteworthy models. We analyse the attention maps of one of our models, finding reasonable interpretations for its predictions. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPain Management and Opioid Use · Pain Mechanisms and Treatments · Emotion and Mood Recognition
MethodsSharpness-Aware Minimization · Mixup
