FIction: 4D Future Interaction Prediction from Video

Kumar Ashutosh; Georgios Pavlakos; Kristen Grauman

arXiv:2412.00932·cs.CV·April 15, 2025

FIction: 4D Future Interaction Prediction from Video

Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman

PDF

Open Access

TL;DR

FIction is a novel model that predicts 4D future human-object interactions from videos, including the objects, their 3D locations, and the interaction methods, surpassing previous 2D-based approaches.

Contribution

Introduces FIction, a model that fuses past video data to predict 3D interaction locations and methods, advancing activity understanding in 4D space.

Findings

01

Outperforms prior models with over 30% relative gains

02

Accurately predicts objects, locations, and interaction methods in 4D

03

Effective across diverse activities and real-world environments

Abstract

Anticipating how a person will interact with objects in an environment is essential for activity understanding, but existing methods are limited to the 2D space of video frames-capturing physically ungrounded predictions of "what" and ignoring the "where" and "how". We introduce FIction for 4D future interaction prediction from videos. Given an input video of a human activity, the goal is to predict which objects at what 3D locations the person will interact with in the next time period (e.g., cabinet, fridge), and how they will execute that interaction (e.g., poses for bending, reaching, pulling). Our novel model FIction fuses the past video observation of the person's actions and their environment to predict both the "where" and "how" of future interactions. Through comprehensive experiments on a variety of activities and real-world environments in EgoExo4D, we show that our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Video Surveillance and Tracking Methods