VoiceAlign: A Shimming Layer for Enhancing the Usability of Legacy Voice User Interface Systems
Md Ehtesham-Ul-Haque, Syed Masum Billah

TL;DR
VoiceAlign is a novel adaptive layer that enhances legacy voice user interfaces by transforming natural commands into system-compatible syntax, significantly improving usability and reducing errors without modifying existing systems.
Contribution
We introduce VoiceAlign, a shimming layer that uses AI to adapt natural voice commands for legacy VUIs, improving interaction success and user experience.
Findings
Reduced command failures by 50%
Decreased commands needed per task by 25%
Achieved over 90% accuracy with a small, locally fine-tuned language model
Abstract
Voice user interfaces (VUIs) are rapidly transitioning from accessibility features to mainstream interaction modalities. Yet most operating systems' built-in voice commands remain underutilized despite possessing robust technical capabilities. Through our analysis of four commercial VUI systems and a formative study with 16 participants, we found that fixed command formats require exact phrasing, restrictive timeout mechanisms discard input during planning pauses, and insufficient feedback hampers multi-step interactions. To address these challenges, we developed VoiceAlign, an adaptive shimming layer that mediates between users and legacy VUI systems. VoiceAlign intercepts natural voice commands, transforms them to match the required syntax using a large language model, and transmits these adapted commands through a virtual audio channel that remains transparent to the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Speech and dialogue systems · ICT in Developing Communities
