# Speech Foundation Models Generalize to Time Series Tasks from Wearable Sensor Data

**Authors:** Jaya Narain, Zakaria Aldeneh, Shirley Ren

arXiv: 2509.00221 · 2025-11-25

## TL;DR

This paper demonstrates that speech foundation models like HuBERT and wav2vec 2.0 can be effectively applied to wearable sensor time-series data, achieving state-of-the-art results across various tasks and promoting unified modeling of speech and sensor modalities.

## Contribution

It shows that speech foundation models generalize well to sensor data and outperform modality-specific models in diverse wearable sensor tasks.

## Key findings

- Speech models outperform modality-specific models in sensor tasks.
- Convolutional encoders are particularly effective for sensor data.
- Probing features from speech models enhances performance in data-scarce scenarios.

## Abstract

Both speech and sensor time series data encode information in both the time- and frequency- domains, like spectral powers and waveform shapelets. We show that speech foundation models learn representations that generalize beyond the speech domain and achieve state-of-the-art performance on diverse time-series tasks from wearable sensors. Probes trained on features extracted from HuBERT and wav2vec 2.0 outperform those extracted from self-supervised models trained directly on modality-specific datasets for mood classification, arrhythmia detection, and activity classification tasks. We find that the convolutional feature encoders of speech models are particularly relevant for wearable sensor applications. The proposed approach enhances performance on data-scarce time-series tasks using simple probing methods. This work takes a step toward developing generalized time-series models that unify speech and sensor modalities.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00221/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00221/full.md

---
Source: https://tomesphere.com/paper/2509.00221