Self-Supervised Multimodal Fusion Transformer for Passive Activity   Recognition

Armand K. Koupai; Mohammud J. Bocus; Raul Santos-Rodriguez; Robert J.; Piechocki; Ryan McConville

arXiv:2209.03765·eess.SP·September 9, 2022

Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition

Armand K. Koupai, Mohammud J. Bocus, Raul Santos-Rodriguez, Robert J., Piechocki, Ryan McConville

PDF

Open Access

TL;DR

This paper introduces a self-supervised multimodal fusion transformer for passive Wi-Fi-based activity recognition, demonstrating high accuracy with limited labeled data and efficient resource use.

Contribution

It proposes a novel attention-based Fusion Transformer for multimodal sensor fusion and a self-supervised learning framework that enhances activity recognition performance.

Findings

01

Achieves 95.9% F1-score with SSL

02

Outperforms ResNet with fewer resources

03

Excels with minimal labeled data (1-20%)

Abstract

The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndoor and Outdoor Localization Technologies · Distributed Sensor Networks and Detection Algorithms · Speech and Audio Processing

MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · 1x1 Convolution · Adam · Softmax · Residual Connection · Position-Wise Feed-Forward Layer · Kaiming Initialization · Bottleneck Residual Block