Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation
Yuxuan Zhang, Lei Liu, Li Liu

TL;DR
This paper introduces a privacy-preserving federated learning framework for automatic cued speech recognition that leverages mutual knowledge distillation to effectively utilize decentralized data without sharing sensitive videos.
Contribution
It proposes a novel federated cued speech recognition framework with mutual knowledge distillation, enabling cross-modal semantic consistency and privacy protection in decentralized data settings.
Findings
Outperforms federated learning baselines and centralized methods.
Achieves 9.7% CER and 15.0% WER improvements.
First federated approach for ACSR with privacy considerations.
Abstract
Cued Speech (CS) is a visual coding tool to encode spoken languages at the phonetic level, which combines lip-reading and hand gestures to effectively assist communication among people with hearing impairments. The Automatic CS Recognition (ACSR) task aims to recognize CS videos into linguistic texts, which involves both lips and hands as two distinct modalities conveying complementary information. However, the traditional centralized training approach poses potential privacy risks due to the use of facial and gesture videos in CS data. To address this issue, we propose a new Federated Cued Speech Recognition (FedCSR) framework to train an ACSR model over the decentralized CS data without sharing private information. In particular, a mutual knowledge distillation method is proposed to maintain cross-modal semantic consistency of the Non-IID CS data, which ensures learning a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Speech and Audio Processing · Hearing Impairment and Communication
