Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Running Zhao, Jiangtao Yu, Hang Zhao, Edith C.H. Ngai

TL;DR
Radio2Text introduces a novel mmWave-based streaming speech recognition system capable of handling large vocabularies with low latency, utilizing a tailored Transformer architecture and cross-modal knowledge distillation to improve accuracy.
Contribution
The paper presents the first mmWave-based streaming ASR system with a large vocabulary, employing a new guidance initialization and cross-modal knowledge distillation techniques.
Findings
Achieves 5.7% character error rate on large vocabulary recognition.
Demonstrates effective learning of speech features from mmWave signals.
Enhances streaming ASR performance with novel transfer learning methods.
Abstract
Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words. Radio2Text is based on a tailored streaming Transformer that is capable of effectively learning representations of speech-related features, paving the way for streaming ASR with a large vocabulary. To alleviate the deficiency of streaming networks unable to access entire future inputs, we propose the Guidance Initialization that facilitates the transfer of feature knowledge related to the global context from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Softmax · Absolute Position Encodings · Residual Connection · Dense Connections · Layer Normalization · Dropout
