Cuing Without Sharing: A Federated Cued Speech Recognition Framework via   Mutual Knowledge Distillation

Yuxuan Zhang; Lei Liu; Li Liu

arXiv:2308.03432·cs.MM·August 8, 2023

Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation

Yuxuan Zhang, Lei Liu, Li Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a privacy-preserving federated learning framework for automatic cued speech recognition that leverages mutual knowledge distillation to effectively utilize decentralized data without sharing sensitive videos.

Contribution

It proposes a novel federated cued speech recognition framework with mutual knowledge distillation, enabling cross-modal semantic consistency and privacy protection in decentralized data settings.

Findings

01

Outperforms federated learning baselines and centralized methods.

02

Achieves 9.7% CER and 15.0% WER improvements.

03

First federated approach for ACSR with privacy considerations.

Abstract

Cued Speech (CS) is a visual coding tool to encode spoken languages at the phonetic level, which combines lip-reading and hand gestures to effectively assist communication among people with hearing impairments. The Automatic CS Recognition (ACSR) task aims to recognize CS videos into linguistic texts, which involves both lips and hands as two distinct modalities conveying complementary information. However, the traditional centralized training approach poses potential privacy risks due to the use of facial and gesture videos in CS data. To address this issue, we propose a new Federated Cued Speech Recognition (FedCSR) framework to train an ACSR model over the decentralized CS data without sharing private information. In particular, a mutual knowledge distillation method is proposed to maintain cross-modal semantic consistency of the Non-IID CS data, which ensures learning a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuxuanzhang0713/fedcsr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Speech and Audio Processing · Hearing Impairment and Communication