End-to-end Alexa Device Arbitration

Jarred Barber; Yifeng Fan; Tao Zhang

arXiv:2112.04914·eess.AS·February 18, 2022

End-to-end Alexa Device Arbitration

Jarred Barber, Yifeng Fan, Tao Zhang

PDF

Open Access

TL;DR

This paper presents an end-to-end machine learning approach for device arbitration in smart homes, determining which device is closest to the user based on microphone array data, improving over traditional signal processing methods.

Contribution

The paper introduces a novel end-to-end learning system for device arbitration that learns independent feature embeddings and aggregates them for decision-making.

Findings

01

The system outperforms traditional signal processing baselines.

02

Large-scale simulated data effectively trains the model.

03

Embedding aggregation improves device proximity detection.

Abstract

We introduce a variant of the speaker localization problem, which we call device arbitration. In the device arbitration problem, a user utters a keyword that is detected by multiple distributed microphone arrays (smart home devices), and we want to determine which device was closest to the user. Rather than solving the full localization problem, we propose an end-to-end machine learning system. This system learns a feature embedding that is computed independently on each device. The embeddings from each device are then aggregated together to produce the final arbitration decision. We use a large-scale room simulation to generate training and evaluation data, and compare our system against a signal processing baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Music and Audio Processing