Learning Transferable Sensor Models via Language-Informed Pretraining

Yuliang Chen; Arvind Pillai; Yu Yvonne Wu; Tess Z. Griffin; Lisa Marsch; Michael V. Heinz; Nicholas C. Jacobson; Andrew Campbell

arXiv:2603.11950·cs.AI·March 13, 2026

Learning Transferable Sensor Models via Language-Informed Pretraining

Yuliang Chen, Arvind Pillai, Yu Yvonne Wu, Tess Z. Griffin, Lisa Marsch, Michael V. Heinz, Nicholas C. Jacobson, Andrew Campbell

PDF

Open Access

TL;DR

SLIP is a novel framework that learns language-aligned sensor representations, enabling zero-shot transfer and flexible sensor configurations across diverse datasets, improving semantic understanding and downstream task performance.

Contribution

SLIP introduces a flexible, language-informed pretraining method that generalizes sensor models across various setups without retraining, combining contrastive alignment with captioning.

Findings

01

Achieves 77.14% linear-probing accuracy, outperforming baselines.

02

Demonstrates effective zero-shot transfer across 11 datasets.

03

Reaches 64.83% accuracy in sensor-based question answering.

Abstract

Modern sensing systems generate large volumes of unlabeled multivariate time-series data. This abundance of unlabeled data makes self-supervised learning (SSL) a natural approach for learning transferable representations. However, most existing approaches are optimized for reconstruction or forecasting objectives and often fail to capture the semantic structure required for downstream classification and reasoning tasks. While recent sensor-language alignment methods improve semantic generalization through captioning and zero-shot transfer, they are limited to fixed sensor configurations, such as predefined channel sets, signal lengths, or temporal resolutions, which hinders cross-domain applicability. To address these gaps, we introduce \textbf{SLIP} (\textbf{S}ensor \textbf{L}anguage-\textbf{I}nformed \textbf{P}retraining), an open-source framework for learning language-aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech Recognition and Synthesis