Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition
Armand K. Koupai, Mohammud J. Bocus, Raul Santos-Rodriguez, Robert J., Piechocki, Ryan McConville

TL;DR
This paper introduces a self-supervised multimodal fusion transformer for passive Wi-Fi-based activity recognition, demonstrating high accuracy with limited labeled data and efficient resource use.
Contribution
It proposes a novel attention-based Fusion Transformer for multimodal sensor fusion and a self-supervised learning framework that enhances activity recognition performance.
Findings
Achieves 95.9% F1-score with SSL
Outperforms ResNet with fewer resources
Excels with minimal labeled data (1-20%)
Abstract
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Distributed Sensor Networks and Detection Algorithms · Speech and Audio Processing
MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · 1x1 Convolution · Adam · Softmax · Residual Connection · Position-Wise Feed-Forward Layer · Kaiming Initialization · Bottleneck Residual Block
