GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement
Hao Wang, Euijoon Ahn, Jinman Kim

TL;DR
This paper empirically investigates how general video transformer architectures can be adapted for remote physiological measurement from facial videos, proposing guidelines that improve robustness without specialized modules.
Contribution
It introduces practical guidelines for adapting general video transformers to RPM, eliminating the need for RPM-specific modules and enhancing robustness across datasets.
Findings
GVT2RPM achieves comparable or better accuracy than RPM-specific methods.
The proposed guidelines generalize across different video transformer architectures.
The method demonstrates robustness in intra- and cross-dataset evaluations.
Abstract
Remote physiological measurement (RPM) is an essential tool for healthcare monitoring as it enables the measurement of physiological signs, e.g., heart rate, in a remote setting via physical wearables. Recently, with facial videos, we have seen rapid advancements in video-based RPMs. However, adopting facial videos for RPM in the clinical setting largely depends on the accuracy and robustness (work across patient populations). Fortunately, the capability of the state-of-the-art transformer architecture in general (natural) video understanding has resulted in marked improvements and has been translated to facial understanding, including RPM. However, existing RPM methods usually need RPM-specific modules, e.g., temporal difference convolution and handcrafted feature maps. Although these customized modules can increase accuracy, they are not demonstrated for their robustness across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computing and Algorithms · Industrial Vision Systems and Defect Detection · Muscle activation and electromyography studies
MethodsALIGN · Convolution
