A Full Transformer-based Framework for Automatic Pain Estimation using Videos
Stefanos Gkikas, Manolis Tsiknakis

TL;DR
This paper introduces a comprehensive transformer-based framework for automatic pain estimation from videos, achieving state-of-the-art performance and demonstrating strong generalization across pain assessment tasks.
Contribution
It presents a novel full transformer-based framework with specialized transformer modules for improved pain estimation from video data.
Findings
Achieved state-of-the-art performance on BioVid database
Demonstrated high efficacy and efficiency in pain estimation
Showed strong generalization across primary pain assessment tasks
Abstract
The automatic estimation of pain is essential in designing an optimal pain management system offering reliable assessment and reducing the suffering of patients. In this study, we present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks. Elaborating on videos from the BioVid database, we demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax
