Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR
Xugang Lu, Peng Shen, Yu Tsao, and Hisashi Kawai

TL;DR
This paper introduces a novel temporal order preserved optimal transport method for cross-modal knowledge transfer in ASR, effectively aligning acoustic and linguistic features while maintaining temporal order, leading to improved recognition performance.
Contribution
The paper proposes TOT-CAKT, a new optimal transport-based method that preserves temporal order during feature alignment, enhancing cross-modal knowledge transfer for ASR.
Findings
Significant improvement in Mandarin ASR performance.
Effective preservation of temporal order in feature alignment.
Outperforms several state-of-the-art models.
Abstract
Transferring linguistic knowledge from a pretrained language model (PLM) to an acoustic model has been shown to greatly improve the performance of automatic speech recognition (ASR). However, due to the heterogeneous feature distributions in cross-modalities, designing an effective model for feature alignment and knowledge transfer between linguistic and acoustic sequences remains a challenging task. Optimal transport (OT), which efficiently measures probability distribution discrepancies, holds great potential for aligning and transferring knowledge between acoustic and linguistic modalities. Nonetheless, the original OT treats acoustic and linguistic feature sequences as two unordered sets in alignment and neglects temporal order information during OT coupling estimation. Consequently, a time-consuming pretraining stage is required to learn a good alignment between the acoustic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning
