Instantaneous Physiological Estimation using Video Transformers
Ambareesh Revanur, Ananyananda Dasari, Conrad S. Tucker, Laszlo A., Jeni

TL;DR
This paper introduces a video Transformer model that estimates instantaneous heart and respiration rates from face videos, outperforming existing methods and providing real-time physiological monitoring.
Contribution
The study presents a novel video Transformer architecture that estimates instantaneous physiological signals directly from face videos, addressing alignment issues with a frequency domain loss.
Findings
Outperformed existing methods on the V4V benchmark
Achieved an instantaneous-MAE of 13.0 bpm for heart rate
Effective in estimating real-time physiological signals
Abstract
Video-based physiological signal estimation has been limited primarily to predicting episodic scores in windowed intervals. While these intermittent values are useful, they provide an incomplete picture of patients' physiological status and may lead to late detection of critical conditions. We propose a video Transformer for estimating instantaneous heart rate and respiration rate from face videos. Physiological signals are typically confounded by alignment errors in space and time. To overcome this, we formulated the loss in the frequency domain. We evaluated the method on the large scale Vision-for-Vitals (V4V) benchmark. It outperformed both shallow and deep learning based methods for instantaneous respiration rate estimation. In the case of heart-rate estimation, it achieved an instantaneous-MAE of 13.0 beats-per-minute.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNon-Invasive Vital Sign Monitoring · ECG Monitoring and Analysis · EEG and Brain-Computer Interfaces
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Residual Connection · Layer Normalization · Label Smoothing · Dropout · Dense Connections
