Lessons Learned from Developing a Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference
Yonatan Tussa, Andy Heredia, Nirupam Roy

TL;DR
This paper discusses the design and development of a privacy-preserving, ear-mounted multimodal wearable device that performs local voice and vision inference, addressing privacy, power, and usability challenges.
Contribution
It presents a hardware-software co-design approach for a compact, offline multimodal AI wearable, highlighting key design lessons and feasibility on mobile hardware.
Findings
Local multimodal inference is feasible on commodity mobile hardware.
Design challenges include power management, connectivity, and social acceptability.
The system enables privacy-preserving, interactive voice and vision AI in a wearable form factor.
Abstract
Many promising applications of multimodal wearables require continuous sensing and heavy computation, yet users reject such devices due to privacy concerns. This paper shares our experiences building an ear-mounted voice-and-vision wearable that performs local AI inference using a paired smartphone as a trusted personal edge. We describe the hardware-software co-design of this privacy-preserving system, including challenges in integrating a camera, microphone, and speaker within a 30-gram form factor, enabling wake word-triggered capture, and running quantized vision-language and large-language models entirely offline. Through iterative prototyping, we identify key design hurdles in power budgeting, connectivity, latency, and social acceptability. Our initial evaluation shows that fully local multimodal inference is feasible on commodity mobile hardware with interactive latency. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Innovative Human-Technology Interaction · Mobile Crowdsensing and Crowdsourcing
