Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality   Interaction

Darshana Rathnayake; Ashen de Silva; Dasun Puwakdandawa; Lakmal; Meegahapola; Archan Misra; Indika Perera

arXiv:2010.06584·cs.HC·December 21, 2020

Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality Interaction

Darshana Rathnayake, Ashen de Silva, Dasun Puwakdandawa, Lakmal, Meegahapola, Archan Misra, Indika Perera

PDF

TL;DR

This paper introduces a sensor fusion architecture for multimodal mixed reality interaction that dynamically balances model complexity across visual, speech, and gestural inputs to reduce latency and improve accuracy in resource-constrained devices.

Contribution

It presents a reconfigurable, cross-modal sensor fusion system that optimizes model complexity based on context, significantly reducing latency and enhancing comprehension accuracy.

Findings

01

3-fold reduction in comprehension latency

02

10-15% increase in accuracy

03

Model combination performance varies with context

Abstract

Natural human interactions for Mixed Reality Applications are overwhelmingly multimodal: humans communicate intent and instructions via a combination of visual, aural and gestural cues. However, supporting low-latency and accurate comprehension of such multimodal instructions (MMI), on resource-constrained wearable devices, remains an open challenge, especially as the state-of-the-art comprehension techniques for each individual modality increasingly utilize complex Deep Neural Network models. We demonstrate the possibility of overcoming the core limitation of latency--vs.--accuracy tradeoff by exploiting cross-modal dependencies -- i.e., by compensating for the inferior performance of one model with an increased accuracy of more complex model of a different modality. We present a sensor fusion architecture that performs MMI comprehension in a quasi-synchronous fashion, by fusing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.