GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking

Xudong Han; Pengcheng Fang; Yueying Tian; Jianhui Yu; Xiaohao Cai; Daniel Roggen; Philip Birch

arXiv:2508.08117·cs.CV·August 12, 2025

GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking

Xudong Han, Pengcheng Fang, Yueying Tian, Jianhui Yu, Xiaohao Cai, Daniel Roggen, Philip Birch

PDF

Open Access

TL;DR

GRASPTrack introduces a depth-aware multi-object tracking framework that leverages monocular depth estimation and 3D geometric reasoning to improve robustness in occluded and complex scenes.

Contribution

It integrates monocular depth estimation with segmentation into a tracking pipeline, enabling explicit 3D reasoning and robust association through novel voxel-based IoU and adaptive noise compensation.

Findings

01

Achieves competitive performance on MOT17, MOT20, and DanceTrack benchmarks.

02

Significantly improves tracking robustness in occlusion-heavy scenes.

03

Enhances motion association with 3D motion cues.

Abstract

Multi-object tracking (MOT) in monocular videos is fundamentally challenged by occlusions and depth ambiguity, issues that conventional tracking-by-detection (TBD) methods struggle to resolve owing to a lack of geometric awareness. To address these limitations, we introduce GRASPTrack, a novel depth-aware MOT framework that integrates monocular depth estimation and instance segmentation into a standard TBD pipeline to generate high-fidelity 3D point clouds from 2D detections, thereby enabling explicit 3D geometric reasoning. These 3D point clouds are then voxelized to enable a precise and robust Voxel-Based 3D Intersection-over-Union (IoU) for spatial association. To further enhance tracking robustness, our approach incorporates Depth-aware Adaptive Noise Compensation, which dynamically adjusts the Kalman filter process noise based on occlusion severity for more reliable state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Human Pose and Action Recognition