Learning Next Action Predictors from Human-Computer Interaction

Omar Shaikh; Valentin Teutschbein; Kanishk Gandhi; Yikun Chi; Nick Haber; Thomas Robinson; Nilam Ram; Byron Reeves; Sherry Yang; Michael S. Bernstein; Diyi Yang

arXiv:2603.05923·cs.CL·March 9, 2026

Learning Next Action Predictors from Human-Computer Interaction

Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Yikun Chi, Nick Haber, Thomas Robinson, Nilam Ram, Byron Reeves, Sherry Yang, Michael S. Bernstein, Diyi Yang

PDF

Open Access

TL;DR

This paper introduces a new approach for predicting users' next actions in human-computer interactions by leveraging long interaction histories, novel data annotation, and a model called LongNAP that outperforms baselines.

Contribution

The paper presents a large-scale annotated dataset of user interactions, a novel LongNAP model combining parametric and in-context learning, and demonstrates improved next action prediction performance.

Findings

01

LongNAP outperforms supervised and prompted baselines significantly.

02

17.1% of predictions are well-aligned with actual user actions.

03

The dataset includes over 360K annotated actions from 20 users over a month.

Abstract

Truly proactive AI systems must anticipate what we will do next. This foresight demands far richer information than the sparse signals we type into our prompts -- it demands reasoning over the entire context of what we see and do. We formalize this as next action prediction (NAP): given a sequence of a user's multimodal interactions with a computer (screenshots, clicks, sensor data), predict that user's next action. Progress on this task requires both new data and modeling approaches. To scale data, we annotate longitudinal, naturalistic computer use with vision-language models. We release an open-source pipeline for performing this labeling on private infrastructure, and label over 360K actions across one month of continuous phone usage from 20 users, amounting to 1,800 hours of screen time. We then introduce LongNAP, a user model that combines parametric and in-context learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersonal Information Management and User Behavior · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)