Discriminatory and orthogonal feature learning for noise robust keyword spotting
Donghyeon Kim, Kyungdeuk Ko, David K. Han, Hanseok Ko

TL;DR
This paper introduces LOVO loss, combining triplet, spectral norm orthogonal, and inner class distance losses, to enhance noise robustness and discriminatory feature learning in lightweight keyword spotting models for smart devices.
Contribution
The paper proposes LOVO loss, a novel combination of losses, to improve noise robustness and feature discrimination in resource-constrained KWS systems.
Findings
LOVO loss improves performance in low SNR conditions.
Enhanced feature discrimination in noisy environments.
Maintains small model footprint with improved robustness.
Abstract
Keyword Spotting (KWS) is an essential component in a smart device for alerting the system when a user prompts it with a command. As these devices are typically constrained by computational and energy resources, the KWS model should be designed with a small footprint. In our previous work, we developed lightweight dynamic filters which extract a robust feature map within a noisy environment. The learning variables of the dynamic filter are jointly optimized with KWS weights by using Cross-Entropy (CE) loss. CE loss alone, however, is not sufficient for high performance when the SNR is low. In order to train the network for more robust performance in noisy environments, we introduce the LOw Variant Orthogonal (LOVO) loss. The LOVO loss is composed of a triplet loss applied on the output of the dynamic filter, a spectral norm-based orthogonal loss, and an inner class distance loss applied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTriplet Loss
