OmniActions: Predicting Digital Actions in Response to Real-World   Multimodal Sensory Inputs with LLMs

Jiahao Nick Li; Yan Xu; Tovi Grossman; Stephanie Santosa; Michelle Li

arXiv:2405.03901·cs.HC·May 8, 2024

OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs

Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, Michelle Li

PDF

Open Access

TL;DR

This paper introduces OmniActions, a system that uses large language models to predict digital follow-up actions based on multimodal sensory inputs, aiming to facilitate seamless interaction in augmented reality environments.

Contribution

The paper presents a novel pipeline leveraging LLMs to predict digital actions from multimodal inputs, grounded in a comprehensive design space derived from user data.

Findings

01

Identified effective LLM techniques for action prediction

02

Developed an interactive prototype demonstrating system feasibility

03

Gathered user feedback on action prediction accuracy

Abstract

The progression to "Pervasive Augmented Reality" envisions easy access to multimodal information continuously. However, in many everyday scenarios, users are occupied physically, cognitively or socially. This may increase the friction to act upon the multimodal information that users encounter in the world. To reduce such friction, future interactive interfaces should intelligently provide quick access to digital actions based on users' context. To explore the range of possible digital actions, we conducted a diary study that required participants to capture and share the media that they intended to perform actions on (e.g., images or audio), along with their desired actions and other contextual information. Using this data, we generated a holistic design space of digital follow-up actions that could be performed in response to different types of multimodal sensory inputs. We then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Personal Information Management and User Behavior · Speech and dialogue systems