MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model

Asmit Bandyopadhyay; Rohit Basu; Tanmay Sen; Swagatam Das

arXiv:2508.01701·cs.LG·August 5, 2025

MHARFedLLM: Multimodal Human Activity Recognition Using Federated Large Language Model

Asmit Bandyopadhyay, Rohit Basu, Tanmay Sen, Swagatam Das

PDF

Open Access

TL;DR

This paper introduces FedTime-MAGNET, a multimodal federated learning framework utilizing a novel graph attention transformer architecture and time series LLMs to enhance human activity recognition accuracy and robustness across heterogeneous data sources.

Contribution

The work presents MAGNET, a new multimodal fusion architecture with graph attention and Mixture of Experts, integrated into a federated learning framework for improved HAR performance.

Findings

01

Achieved a centralized F1 score of 0.934.

02

Achieved a federated F1 score of 0.881.

03

Demonstrated significant performance improvements over existing methods.

Abstract

Human Activity Recognition (HAR) plays a vital role in applications such as fitness tracking, smart homes, and healthcare monitoring. Traditional HAR systems often rely on single modalities, such as motion sensors or cameras, limiting robustness and accuracy in real-world environments. This work presents FedTime-MAGNET, a novel multimodal federated learning framework that advances HAR by combining heterogeneous data sources: depth cameras, pressure mats, and accelerometers. At its core is the Multimodal Adaptive Graph Neural Expert Transformer (MAGNET), a fusion architecture that uses graph attention and a Mixture of Experts to generate unified, discriminative embeddings across modalities. To capture complex temporal dependencies, a lightweight T5 encoder only architecture is customized and adapted within this framework. Extensive experiments show that FedTime-MAGNET significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications