LUMIA: A Handheld Vision-to-Music System for Real-Time, Embodied Composition

Chung-Ta Huang; Connie Cheng; Vealy Lai

arXiv:2512.17228·cs.HC·December 22, 2025

LUMIA: A Handheld Vision-to-Music System for Real-Time, Embodied Composition

Chung-Ta Huang, Connie Cheng, Vealy Lai

PDF

Open Access

TL;DR

LUMIA is a handheld device that transforms visual scenes into music in real-time, enabling embodied, improvisational composition through a vision-language model and user interaction.

Contribution

It introduces a novel, embodied approach to music creation by integrating vision, language, and sound in a handheld system for real-time, improvisational composition.

Findings

01

Enables real-time visual-to-music transformation.

02

Supports embodied, improvisational workflows.

03

Bridges perception and musical creation through AI.

Abstract

Most digital music tools emphasize precision and control, but often lack support for tactile, improvisational workflows grounded in environmental interaction. Lumia addresses this by enabling users to "compose through looking"--transforming visual scenes into musical phrases using a handheld, camera-based interface and large multimodal models. A vision-language model (GPT-4V) analyzes captured imagery to generate structured prompts, which, combined with user-selected instrumentation, guide a text-to-music pipeline (Stable Audio). This real-time process allows users to frame, capture, and layer audio interactively, producing loopable musical segments through embodied interaction. The system supports a co-creative workflow where human intent and model inference shape the musical outcome. By embedding generative AI within a physical device, Lumia bridges perception and composition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Innovative Human-Technology Interaction · Interactive and Immersive Displays