Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning
Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien,, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, and Che Lin

TL;DR
This paper introduces SCANE and SUMMIT, a scalable embedding framework and transformer model designed to improve multivariate time series analysis in healthcare by effectively handling irregular sampling and missing data.
Contribution
The paper presents a novel token-based embedding approach that eliminates the need for imputation and enhances representation learning for irregular, missing data in multivariate time series.
Findings
SUMMIT outperforms state-of-the-art methods on EHR datasets.
SCANE effectively regularizes feature embeddings for better generalization.
The approach improves prediction accuracy in healthcare time series with high missingness.
Abstract
Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Time Series Analysis and Forecasting
MethodsLinear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
