TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition
Yunjiao Zhou, Jianfei Yang, Han Zou, Lihua Xie

TL;DR
This paper introduces TENT, a novel method that aligns language models with IoT sensor data to enable zero-shot human activity recognition, effectively recognizing unseen actions by connecting textual semantics with multi-modal sensor signals.
Contribution
TENT is the first approach to jointly align textual embeddings with IoT sensor signals, enabling zero-shot activity recognition across multiple sensor modalities.
Findings
Achieves state-of-the-art zero-shot HAR performance.
Improves vision-language models by over 12%.
Successfully recognizes unseen human activities.
Abstract
Recent achievements in language models have showcased their extraordinary capabilities in bridging visual information with semantic language understanding. This leads us to a novel question: can language models connect textual semantics with IoT sensory signals to perform recognition tasks, e.g., Human Activity Recognition (HAR)? If so, an intelligent HAR system with human-like cognition can be built, capable of adapting to new environments and unseen categories. This paper explores its feasibility with an innovative approach, IoT-sEnsors-language alignmEnt pre-Training (TENT), which jointly aligns textual embeddings with IoT sensor signals, including camera video, LiDAR, and mmWave. Through the IoT-language contrastive learning, we derive a unified semantic feature space that aligns multi-modal features with language embeddings, so that the IoT data corresponds to specific words that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
