MOSIV: Multi-Object System Identification from Videos

Chunjiang Liu; Xiaoyuan Wang; Qingran Lin; Albert Xiao; Haoyu Chen; Shizheng Wen; Hao Zhang; Lu Qi; Ming-Hsuan Yang; Laszlo A. Jeni; Min Xu; Yizhou Zhao

arXiv:2603.06022·cs.CV·March 9, 2026

MOSIV: Multi-Object System Identification from Videos

Chunjiang Liu, Xiaoyuan Wang, Qingran Lin, Albert Xiao, Haoyu Chen, Shizheng Wen, Hao Zhang, Lu Qi, Ming-Hsuan Yang, Laszlo A. Jeni, Min Xu, Yizhou Zhao

PDF

Open Access 1 Datasets

TL;DR

MOSIV is a novel framework that enables accurate multi-object system identification from videos by optimizing continuous material parameters with a differentiable simulator, significantly advancing the understanding of complex multi-object interactions.

Contribution

Introduces MOSIV, the first method for multi-object system identification from videos using continuous parameters and geometric objectives, along with a synthetic benchmark for evaluation.

Findings

01

MOSIV improves grounding accuracy over baselines.

02

MOSIV achieves higher long-horizon simulation fidelity.

03

Object-level supervision is crucial for stable optimization.

Abstract

We introduce the challenging problem of multi-object system identification from videos, for which prior methods are ill-suited due to their focus on single-object scenes or discrete material classification with a fixed set of material prototypes. To address this, we propose MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video. We also present a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation. On this benchmark, MOSIV substantially improves grounding accuracy and long-horizon simulation fidelity over adapted baselines, establishing it as a strong baseline for this new task. Our analysis shows that object-level fine-grained supervision and geometry-aligned objectives are critical for stable optimization in these complex,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Hanibel/MOSIV
dataset· 82 dl
82 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Generative Adversarial Networks and Image Synthesis