The N-Body Problem: Parallel Execution from Single-Person Egocentric Video

Zhifan Zhu; Yifei Huang; Yoichi Sato; Dima Damen

arXiv:2512.11393·cs.CV·December 15, 2025

The N-Body Problem: Parallel Execution from Single-Person Egocentric Video

Zhifan Zhu, Yifei Huang, Yoichi Sato, Dima Damen

PDF

Open Access

TL;DR

This paper formalizes the N-Body Problem in egocentric videos, proposing a structured prompting method for Vision-Language Models to generate feasible parallel activity plans, significantly improving action coverage and reducing conflicts.

Contribution

It introduces the N-Body Problem framework, new evaluation metrics, and a structured prompting strategy for VLMs to reason about multi-agent task execution from a single egocentric video.

Findings

01

Action coverage increased by 45% for N=2

02

Collision rates reduced by 55%

03

Object and causal conflicts reduced by 45-55%

Abstract

Humans can intuitively parallelise complex activities, but can a model learn this from observing a single person? Given one egocentric video, we introduce the N-Body Problem: how N individuals, can hypothetically perform the same set of tasks observed in this video. The goal is to maximise speed-up, but naive assignment of video segments to individuals often violates real-world constraints, leading to physically impossible scenarios like two people using the same object or occupying the same space. To address this, we formalise the N-Body Problem and propose a suite of metrics to evaluate both performance (speed-up, task coverage) and feasibility (spatial collisions, object conflicts and causal constraints). We then introduce a structured prompting strategy that guides a Vision-Language Model (VLM) to reason about the 3D environment, object usage, and temporal dependencies to produce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Social Robot Interaction and HRI