Efficient and Adaptive Human Activity Recognition via LLM Backbones
Aleksandr Bredikhin, Philippe Lalanda, German Vega

TL;DR
This paper introduces a novel approach to human activity recognition by leveraging large pretrained language models as flexible, efficient backbones, significantly reducing training costs and improving adaptability in sensor-based HAR.
Contribution
It proposes reusing LLMs as generic temporal backbones for HAR, with a structured convolutional projection and parameter-efficient adaptation, enabling rapid, data-efficient, and robust recognition.
Findings
Enables rapid convergence and strong data efficiency.
Achieves robust cross-dataset transfer in low-data settings.
Highlights the complementary roles of convolutional frontends and LLMs.
Abstract
Human Activity Recognition (HAR) is a core task in pervasive computing systems, where models must operate under strict computational constraints while remaining robust to heterogeneous and evolving deployment conditions. Recent advances based on Transformer architectures have significantly improved recognition performance, but typically rely on task-specific models trained from scratch, resulting in high training cost, large data requirements, and limited adaptability to domain shifts. In this paper, we propose a paradigm shift that reuses large pretrained language models (LLMs) as generic temporal backbones for sensor-based HAR, instead of designing domain-specific Transformers. To bridge the modality gap between inertial time series and language models, we introduce a structured convolutional projection that maps multivariate accelerometer and gyroscope signals into the latent space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
