ModalityMirror: Improving Audio Classification in Modality Heterogeneity   Federated Learning with Multimodal Distillation

Tiantian Feng; Tuo Zhang; Salman Avestimehr; Shrikanth S. Narayanan

arXiv:2408.15803·eess.AS·August 29, 2024

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation

Tiantian Feng, Tuo Zhang, Salman Avestimehr, Shrikanth S. Narayanan

PDF

Open Access

TL;DR

ModalityMirror enhances audio classification in multimodal federated learning by using knowledge distillation to address client modality heterogeneity, significantly outperforming existing methods especially when video data is missing.

Contribution

Introduces ModalityMirror, a novel two-phase approach combining modality-wise federated learning and knowledge distillation to improve unimodal audio performance in multimodal federated learning.

Findings

01

Significantly improves audio classification accuracy.

02

Outperforms state-of-the-art methods like Harmony.

03

Effective in scenarios with missing video data.

Abstract

Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with audio is often assumed to be the weaker modality in recognition tasks. To address this challenge, we introduce ModalityMirror to improve audio model performance by leveraging knowledge distillation from an audiovisual federated learning model. ModalityMirror involves two phases: a modality-wise FL stage to aggregate uni-modal encoders; and a federated knowledge distillation stage on multi-modality clients to train an unimodal student model. Our results demonstrate that ModalityMirror significantly improves the audio classification compared to the state-of-the-art FL methods such as Harmony, particularly in audiovisual FL facing video missing. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsKnowledge Distillation