How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction

Sejoon Jun; Hai Nguyen-Truong; Luigi Seminara; Lorenzo Torresani

arXiv:2605.20388·cs.CV·May 21, 2026

How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction

Sejoon Jun, Hai Nguyen-Truong, Luigi Seminara, Lorenzo Torresani

PDF

TL;DR

This paper introduces TrajPilot, a model that predicts future egocentric camera trajectories to improve action and goal prediction, outperforming existing baselines across multiple datasets and tasks.

Contribution

The paper demonstrates that using predicted future trajectories as a conditioning signal significantly enhances egocentric action prediction and planning, surpassing language-based and other models.

Findings

01

TrajPilot outperforms VLM and structured-planner baselines on multiple datasets.

02

Trajectory-based conditioning improves prediction especially at longer horizons.

03

The model maintains performance with RGB-only camera-pose estimation.

Abstract

Predicting how a person's first-person view will evolve (what action will follow, what plan completes a task, whether an in-progress shot will score) is fundamentally under-specified: the same context admits many plausible futures, and a model trained to minimize prediction error is forced to hedge or average across them, getting it wrong either way. Two findings shape our approach. First, the future camera trajectory, the path the head carves through space, lets the model commit to one of those futures: it carries the operator's intent in a form fine enough to determine how an action will unfold, substantially outperforming language as a conditioning signal. Second, this same intent makes the trajectory itself partially predictable from the context at hand, enough that trajectory need not be observed at test time to recover most of the gain. We instantiate these findings as TrajPilot,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.