Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit
Isaac YL Lee, Thanh Nguyen-Duc, Ryo Ueno, Jesse Smith, Peter Y Chan

TL;DR
This study demonstrates that a Multiscale Vision Transformer can passively and accurately predict nursing workload scores from low-resolution thermal videos in an ICU, potentially improving staff workload monitoring.
Contribution
It introduces the use of a Multiscale Vision Transformer for direct and indirect prediction of Nursing Activities Score from thermal videos, outperforming existing models.
Findings
Direct prediction yields lower MSE than indirect prediction.
MViTv2 outperforms R(2+1)D and ResNet50-LSTM models.
Passive thermal video analysis can effectively monitor ICU staff workload.
Abstract
Excessive caregiver workload in hospital nurses has been implicated in poorer patient care and increased worker burnout. Measurement of this workload in the Intensive Care Unit (ICU) is often done using the Nursing Activities Score (NAS), but this is usually recorded manually and sporadically. Previous work has made use of Ambient Intelligence (AmI) by using computer vision to passively derive caregiver-patient interaction times to monitor staff workload. In this letter, we propose using a Multiscale Vision Transformer (MViT) to passively predict the NAS from low-resolution thermal videos recorded in an ICU. 458 videos were obtained from an ICU in Melbourne, Australia and used to train a MViTv2 model using an indirect prediction and a direct prediction method. The indirect method predicted 1 of 8 potentially identifiable NAS activities from the video before inferring the NAS. The direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Attention Is All You Need · Batch Normalization · (2+1)D Convolution · Residual Connection · Softmax · Average Pooling · Global Average Pooling · Layer Normalization · R(2+1)D
