Gensors: Authoring Personalized Visual Sensors with Multimodal   Foundation Models and Reasoning

Michael Xieyang Liu; Savvas Petridis; Vivian Tsai; Alexander J.; Fiannaca; Alex Olwal; Michael Terry; Carrie J. Cai

arXiv:2501.15727·cs.HC·January 28, 2025

Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning

Michael Xieyang Liu, Savvas Petridis, Vivian Tsai, Alexander J., Fiannaca, Alex Olwal, Michael Terry, Carrie J. Cai

PDF

TL;DR

Gensors leverages multimodal foundation models to enable users to create, debug, and customize personalized visual sensors through natural language, enhancing control and understanding of AI sensing systems.

Contribution

The paper introduces Gensors, a novel system that supports requirement elicitation, debugging, and customization of AI sensors using multimodal models and reasoning capabilities.

Findings

01

Users felt greater control and understanding with Gensors.

02

Gensors uncovered user blind spots and overlooked criteria.

03

The system improved sensor debugging and customization processes.

Abstract

Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.