SAMOFT: Robust Multi-Object Tracking via Region and Flow
Yanchao Wang, Dawei Zhang, Chengzhuan Yang, Wei Liu, Minglu Li, Hua Wang, Zhonglong Zheng, Ming-Hsuan Yang

TL;DR
SAMOFT introduces a pixel-level cue-based multi-object tracking method that enhances robustness under complex motion, occlusion, and deformation by integrating optical flow, SAM, and dynamic correction modules.
Contribution
The paper proposes SAMOFT, a novel multi-object tracker that leverages pixel-level information and new modules for improved robustness and accuracy in challenging scenarios.
Findings
Outperforms baseline trackers on DanceTrack and MOTChallenge benchmarks.
Achieves state-of-the-art performance with robust handling of occlusion and deformation.
Demonstrates effectiveness of pixel-level cues in complex motion scenarios.
Abstract
Multi-object tracking (MOT) is a fundamental task in computer vision that requires continuously tracking multiple targets while maintaining consistent identities across frames. However, most existing approaches primarily rely on instance-level object features for trajectory association, which often leads to degraded performance under challenging conditions such as object deformation, nonlinear motion, and occlusion. In this work, we propose SAMOFT, a robust tracker that leverages pixel-level cues to improve robustness under complex motion scenarios. Specifically, we introduce a Pixel Motion Matching (PMM) module that integrates the Segment Anything Model (SAM) with dense optical flow to refine Kalman filter-based motion prediction using instantaneous foreground pixel motion. To further enhance robustness under unreliable detections, we design a Centroid Distance Matching (CDM) module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
