Bootstrapping Vision-language Models for Self-supervised Remote   Physiological Measurement

Zijie Yue; Miaojing Shi; Hanli Wang; Shuai Ding; Qijun Chen; Shanlin; Yang

arXiv:2407.08507·cs.CV·February 18, 2025·2 cites

Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement

Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin, Yang

PDF

Open Access

TL;DR

This paper introduces a novel self-supervised framework that leverages vision-language models to improve remote physiological measurement from facial videos, effectively estimating vital signs without extensive labeled data.

Contribution

It is the first to adapt vision-language models for frequency-aware, self-supervised remote physiological measurement, integrating contrastive and generative learning mechanisms.

Findings

01

Outperforms existing self-supervised methods on four benchmarks.

02

Effectively estimates vital signs without labeled PPG signals.

03

Successfully integrates vision-language models for frequency-related knowledge.

Abstract

Facial video-based remote physiological measurement is a promising research area for detecting human vital signs (e.g., heart rate, respiration frequency) in a non-contact way. Conventional approaches are mostly supervised learning, requiring extensive collections of facial videos and synchronously recorded photoplethysmography (PPG) signals. To tackle it, self-supervised learning has recently gained attentions; due to the lack of ground truth PPG signals, its performance is however limited. In this paper, we propose a novel self-supervised framework that successfully integrates the popular vision-language models (VLMs) into the remote physiological measurement task. Given a facial video, we first augment its positive and negative video samples with varying rPPG signal frequencies. Next, we introduce a frequency-oriented vision-text pair generation method by carefully creating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces

MethodsALIGN · Contrastive Learning