Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech
Niclas Pokel, Yiming Zhao, Pehu\'en Moure, Yingqiang Gao, Roman B\"ohringer

TL;DR
Adapt4Me is a web-based environment that allows non-expert users to personalize speech recognition models through an interactive, uncertainty-aware, three-stage process, improving ASR for non-normative speech efficiently.
Contribution
This work introduces Adapt4Me, a novel decentralized platform that operationalizes Bayesian active learning for user-driven ASR personalization without expert supervision.
Findings
Enables rapid speaker-specific ASR profiling.
Facilitates fast, incremental model updates via VI-LoRA.
Empowers users to iteratively refine models through visualized uncertainty.
Abstract
Personalizing Automatic Speech Recognition (ASR) for non-normative speech remains challenging because data collection is labor-intensive and model training is technically complex. To address these limitations, we propose Adapt4Me, a web-based decentralized environment that operationalizes Bayesian active learning to enable end-to-end personalization without expert supervision. The app exposes data selection, adaptation, and validation to lay users through a three-stage human-in-the-loop workflow: (1) rapid profiling via greedy phoneme sampling to capture speaker-specific acoustics; (2) backend personalization using Variational Inference Low-Rank Adaptation (VI-LoRA) to enable fast, incremental updates; and (3) continuous improvement, where users guide model refinement by resolving visualized model uncertainty via low-friction top-k corrections. By making epistemic uncertainty explicit,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Gaussian Processes and Bayesian Inference
