AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet G\"und\"uz, Yunsu Kim, Kamer Ali Yuksel, Mohamed Al-Badrashiny,, Thiago Castro Ferreira, and Hassan Sawaf

TL;DR
AutoMode-ASR is a framework that intelligently selects the best ASR system for each audio segment to improve transcription quality and reduce costs, using a decision model trained on audio features.
Contribution
It introduces a decision-based ensemble approach for selecting ASR systems per segment, optimizing quality and cost without modifying existing ASR models.
Findings
16.2% reduction in WER
65% cost savings
75% speed improvement
Abstract
We present AutoMode-ASR, a novel framework that effectively integrates multiple ASR systems to enhance the overall transcription quality while optimizing cost. The idea is to train a decision model to select the optimal ASR system for each segment based solely on the audio input before running the systems. We achieve this by ensembling binary classifiers determining the preference between two systems. These classifiers are equipped with various features, such as audio embeddings, quality estimation, and signal properties. Additionally, we demonstrate how using a quality estimator can further improve performance with minimal cost increase. Experimental results show a relative reduction in WER of 16.2%, a cost saving of 65%, and a speed improvement of 75%, compared to using a single-best model for all segments. Our framework is compatible with commercial and open-source black-box ASR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Semantic Web and Ontologies · Fault Detection and Control Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
