VueBuds: Visual Intelligence with Wireless Earbuds
Maruchi Kim, Rasya Fawwaz, Zhi Yang Lim, Brinda Moudgalya, Hexi Wang, Yuanhao Zeng, Shyamnath Gollakota

TL;DR
VueBuds introduces the first camera-integrated wireless earbuds capable of real-time visual understanding within strict power and size constraints, enabling advanced egocentric vision applications.
Contribution
This work presents a novel low-power, camera-equipped wireless earbud design that integrates vision language models for real-time scene understanding.
Findings
VueBuds achieves comparable response quality to Ray-Ban Meta in visual question-answering tasks.
The system operates with cameras drawing under 5mW power and provides comprehensive forward coverage.
User studies with 90 participants validate VueBuds' effectiveness against smart glasses.
Abstract
Despite their ubiquity, wireless earbuds remain audio-centric due to size and power constraints. We present VueBuds, the first camera-integrated wireless earbuds for egocentric vision, capable of operating within stringent power and form-factor limits. Each VueBud embeds a camera into a Sony WF-1000XM3 to stream visual data over Bluetooth to a host device for on-device vision language model (VLM) processing. We show analytically and empirically that while each camera's field of view is partially occluded by the face, the combined binocular perspective provides comprehensive forward coverage. By integrating VueBuds with VLMs, we build an end-to-end system for real-time scene understanding, translation, visual reasoning, and text reading; all from low-resolution monochrome cameras drawing under 5mW through on-demand activation. Through online and in-person user studies with 90…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
