Tracking and Understanding Object Transformations

Yihong Sun; Xinyu Yang; Jennifer J. Sun; Bharath Hariharan

arXiv:2511.04678·cs.CV·January 15, 2026

Tracking and Understanding Object Transformations

Yihong Sun, Xinyu Yang, Jennifer J. Sun, Bharath Hariharan

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces the task of tracking objects through transformations, proposing a new system called TubeletGraph that improves tracking accuracy and understanding of object state changes, supported by a new benchmark dataset.

Contribution

The paper presents TubeletGraph, a zero-shot system for tracking objects through transformations, and introduces VOST-TAS, a benchmark dataset for this task.

Findings

01

TubeletGraph achieves state-of-the-art performance in tracking through transformations.

02

The system demonstrates deep understanding of object state changes.

03

It shows promising capabilities in temporal grounding and semantic reasoning.

Abstract

Real-world objects frequently undergo state transformations. From an apple being cut into pieces to a butterfly emerging from its cocoon, tracking through these changes is important for understanding real-world objects and dynamics. However, existing methods often lose track of the target object after transformation, due to significant changes in object appearance. To address this limitation, we introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes, accompanied by a new benchmark dataset, VOST-TAS. To tackle this problem, we present TubeletGraph, a zero-shot system that recovers missing objects after transformation and maps out how object states are evolving over time. TubeletGraph first identifies potentially overlooked tracks, and determines whether they should be integrated based on semantic and proximity priors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

yihongs/VOST-TAS
dataset· 15k dl
15k dl

Videos

Tracking and Understanding Object Transformations· slideslive

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning · Gaze Tracking and Assistive Technology