Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

Running Zhao; Jiangtao Yu; Hang Zhao; Edith C.H. Ngai

arXiv:2308.08125·cs.SD·August 17, 2023

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

Running Zhao, Jiangtao Yu, Hang Zhao, Edith C.H. Ngai

PDF

TL;DR

Radio2Text introduces a novel mmWave-based streaming speech recognition system capable of handling large vocabularies with low latency, utilizing a tailored Transformer architecture and cross-modal knowledge distillation to improve accuracy.

Contribution

The paper presents the first mmWave-based streaming ASR system with a large vocabulary, employing a new guidance initialization and cross-modal knowledge distillation techniques.

Findings

01

Achieves 5.7% character error rate on large vocabulary recognition.

02

Demonstrates effective learning of speech features from mmWave signals.

03

Enhances streaming ASR performance with novel transfer learning methods.

Abstract

Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words. Radio2Text is based on a tailored streaming Transformer that is capable of effectively learning representations of speech-related features, paving the way for streaming ASR with a large vocabulary. To alleviate the deficiency of streaming networks unable to access entire future inputs, we propose the Guidance Initialization that facilitates the transfer of feature knowledge related to the global context from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Softmax · Absolute Position Encodings · Residual Connection · Dense Connections · Layer Normalization · Dropout