Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS
Stefanos Gkikas, Manolis Tsiknakis

TL;DR
This paper introduces Twins-PainViT, a modality-agnostic vision transformer framework that combines facial videos and fNIRS data for automatic pain assessment, achieving promising accuracy without domain-specific models.
Contribution
The study presents a novel multimodal, modality-agnostic transformer architecture using waveform representations for both facial videos and fNIRS data, advancing pain assessment methods.
Findings
Achieved 46.76% accuracy in multilevel pain assessment
Demonstrated effectiveness of waveform representations for multimodal data
Validated the modality-agnostic approach in a competitive challenge
Abstract
Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusculoskeletal pain and rehabilitation
