Improving Noise Robustness for Spoken Content Retrieval using Semi-supervised ASR and N-best Transcripts for BERT-based Ranking Models
Yasufumi Moriya, Gareth. J. F. Jones

TL;DR
This paper explores semi-supervised and N-best transcript methods to improve BERT-based spoken content retrieval accuracy, significantly reducing the performance gap caused by ASR errors.
Contribution
It introduces semi-supervised ASR and N-best early fusion techniques to enhance BERT-based ranking for spoken content retrieval, addressing ASR error impact.
Findings
Semi-supervised transcripts improved MRR by 2-5.5%.
N-best early fusion increased MRR by 3-4%.
Combined methods reduced the MRR gap by over 50%.
Abstract
BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciprocal rank (MRR) scores of 10-14%. As a potential method to reduce this disparity, we investigate the use of semi-supervised ASR transcripts and N-best ASR transcripts to mitigate ASR errors for spoken search using BERT-based ranking. Semi-supervised ASR transcripts brought 2-5.5% MRR improvements over standard ASR transcripts and our N-best early fusion methods for BERT DR systems improved MRR by 3-4%. Combining semi-supervised transcripts with N-best early fusion for BERT DR reduced the MRR gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Text and Document Classification Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Residual Connection · Dense Connections · Layer Normalization
