Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models
Mohammad Jalili Torkamani, Israt Zarin

TL;DR
This paper introduces ASTA, an adaptive system that intelligently routes speech recognition and language processing tasks between edge devices and cloud servers to optimize performance, privacy, and resource use in voice-controlled IoT applications.
Contribution
The paper presents a novel adaptive routing mechanism that dynamically balances edge and cloud inference for speech-to-action systems, improving robustness and resource efficiency.
Findings
Achieves 62.5% ASR accuracy on diverse commands.
Successfully routes all commands with a balanced online-offline inference distribution.
Enhances robustness through a rule-based command validation and repair component.
Abstract
Voice-based interaction has emerged as a natural and intuitive modality for controlling IoT devices. However, speech-driven edge devices face a fundamental trade-off between cloud-based solutions, which offer stronger language understanding capabilities at the cost of latency, connectivity dependence, and privacy concerns, and edge-based solutions, which provide low latency and improved privacy but are limited by computational constraints. This paper presents ASTA, an adaptive speech-to-action solution that dynamically routes voice commands between edge and cloud inference to balance performance and system resource utilization. ASTA integrates on-device automatic speech recognition and lightweight offline language-model inference with cloud-based LLM processing, guided by real-time system metrics such as CPU workload, device temperature, and network latency. A metric-aware routing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Speech Recognition and Synthesis · EEG and Brain-Computer Interfaces
