End-to-end Alexa Device Arbitration
Jarred Barber, Yifeng Fan, Tao Zhang

TL;DR
This paper presents an end-to-end machine learning approach for device arbitration in smart homes, determining which device is closest to the user based on microphone array data, improving over traditional signal processing methods.
Contribution
The paper introduces a novel end-to-end learning system for device arbitration that learns independent feature embeddings and aggregates them for decision-making.
Findings
The system outperforms traditional signal processing baselines.
Large-scale simulated data effectively trains the model.
Embedding aggregation improves device proximity detection.
Abstract
We introduce a variant of the speaker localization problem, which we call device arbitration. In the device arbitration problem, a user utters a keyword that is detected by multiple distributed microphone arrays (smart home devices), and we want to determine which device was closest to the user. Rather than solving the full localization problem, we propose an end-to-end machine learning system. This system learns a feature embedding that is computed independently on each device. The embeddings from each device are then aggregated together to produce the final arbitration decision. We use a large-scale room simulation to generate training and evaluation data, and compare our system against a signal processing baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Music and Audio Processing
