ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction
Hao Liu, Yu Hu, Rakiba Rayhana, Ling Bai, and Zheng Liu

TL;DR
This paper introduces ViFusionTST, a deep learning model that fuses multiple image representations of load signals from bed sensors to accurately predict early bed-exit intent, enhancing fall prevention in healthcare settings.
Contribution
The paper presents a novel dual-stream Swin Transformer that fuses waveform and texture image representations of load signals for improved early bed-exit prediction.
Findings
Achieved 0.885 accuracy and 0.794 F1 score on real-world data
Outperformed recent 1D and 2D time-series baselines
Demonstrated effectiveness of image-based load signal fusion for fall prevention
Abstract
Bed-related falls remain a major source of injury in hospitals and long-term care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only one low-cost load cell mounted under a bed leg. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps-recurrence plot, Markov transition field, and Gramian angular field-that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
