Emotion Filtering at the Edge
Ranya Aloufi, Hamed Haddadi, David Boyle

TL;DR
This paper presents a privacy-preserving method for emotion filtering in voice inputs at the edge using CycleGAN, effectively reducing emotional state identification while maintaining speech recognition accuracy on low-cost devices.
Contribution
The authors introduce an edge-based emotion filtering approach using CycleGAN to protect user privacy without sacrificing speech recognition performance.
Findings
Emotion identification reduced by ~91%
Speech recognition accuracy differs only ~0.16% from cloud-based methods
Effective implementation on Raspberry Pi 4
Abstract
Voice controlled devices and services have become very popular in the consumer IoT. Cloud-based speech analysis services extract information from voice inputs using speech recognition techniques. Services providers can thus build very accurate profiles of users' demographic categories, personal preferences, emotional states, etc., and may therefore significantly compromise their privacy. To address this problem, we have developed a privacy-preserving intermediate layer between users and cloud services to sanitize voice input directly at edge devices. We use CycleGAN-based speech conversion to remove sensitive information from raw voice input signals before regenerating neutralized signals for forwarding. We implement and evaluate our emotion filtering approach using a relatively cheap Raspberry Pi 4, and show that performance accuracy is not compromised at the edge. In fact, signals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
