Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Michael Xieyang Liu, Savvas Petridis, Vivian Tsai, Alexander J., Fiannaca, Alex Olwal, Michael Terry, Carrie J. Cai

TL;DR
Gensors leverages multimodal foundation models to enable users to create, debug, and customize personalized visual sensors through natural language, enhancing control and understanding of AI sensing systems.
Contribution
The paper introduces Gensors, a novel system that supports requirement elicitation, debugging, and customization of AI sensors using multimodal models and reasoning capabilities.
Findings
Users felt greater control and understanding with Gensors.
Gensors uncovered user blind spots and overlooked criteria.
The system improved sensor debugging and customization processes.
Abstract
Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
