SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition

Manali Sharma (1); Riya Naik (1); Buvaneshwari G (1) ((1) Tetranetics Private Limited)

arXiv:2601.20890·cs.SD·January 30, 2026

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition

Manali Sharma (1), Riya Naik (1), Buvaneshwari G (1) ((1) Tetranetics Private Limited)

PDF

Open Access

TL;DR

This paper introduces SW-ASR, a modular, context-aware hybrid speech recognition pipeline that enhances robustness for single-word recognition in noisy, low-resource, and real-world communication scenarios, combining denoising, hybrid ASR, and verification layers.

Contribution

It proposes a novel modular framework integrating denoising, hybrid ASR, and verification layers with multiple matching strategies, improving robustness in challenging conditions.

Findings

01

Verification layer improves accuracy in noisy conditions

02

LLM-based matching yields significant gains

03

Hybrid ASR performs well on clean audio

Abstract

Single-word Automatic Speech Recognition (ASR) is a challenging task due to the lack of linguistic context and sensitivity to noise, pronunciation variation, and channel artifacts, especially in low-resource, communication-critical domains such as healthcare and emergency response. This paper reviews recent deep learning approaches and proposes a modular framework for robust single-word detection. The system combines denoising and normalization with a hybrid ASR front end (Whisper + Vosk) and a verification layer designed to handle out-of-vocabulary words and degraded audio. The verification layer supports multiple matching strategies, including embedding similarity, edit distance, and LLM-based matching with optional contextual guidance. We evaluate the framework on the Google Speech Commands dataset and a curated real-world dataset collected from telephony and messaging platforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders