SAMOFT: Robust Multi-Object Tracking via Region and Flow

Yanchao Wang; Dawei Zhang; Chengzhuan Yang; Wei Liu; Minglu Li; Hua Wang; Zhonglong Zheng; Ming-Hsuan Yang

arXiv:2605.09417·cs.CV·May 12, 2026

SAMOFT: Robust Multi-Object Tracking via Region and Flow

Yanchao Wang, Dawei Zhang, Chengzhuan Yang, Wei Liu, Minglu Li, Hua Wang, Zhonglong Zheng, Ming-Hsuan Yang

PDF

TL;DR

SAMOFT introduces a pixel-level cue-based multi-object tracking method that enhances robustness under complex motion, occlusion, and deformation by integrating optical flow, SAM, and dynamic correction modules.

Contribution

The paper proposes SAMOFT, a novel multi-object tracker that leverages pixel-level information and new modules for improved robustness and accuracy in challenging scenarios.

Findings

01

Outperforms baseline trackers on DanceTrack and MOTChallenge benchmarks.

02

Achieves state-of-the-art performance with robust handling of occlusion and deformation.

03

Demonstrates effectiveness of pixel-level cues in complex motion scenarios.

Abstract

Multi-object tracking (MOT) is a fundamental task in computer vision that requires continuously tracking multiple targets while maintaining consistent identities across frames. However, most existing approaches primarily rely on instance-level object features for trajectory association, which often leads to degraded performance under challenging conditions such as object deformation, nonlinear motion, and occlusion. In this work, we propose SAMOFT, a robust tracker that leverages pixel-level cues to improve robustness under complex motion scenarios. Specifically, we introduce a Pixel Motion Matching (PMM) module that integrates the Segment Anything Model (SAM) with dense optical flow to refine Kalman filter-based motion prediction using instantaneous foreground pixel motion. To further enhance robustness under unreliable detections, we design a Centroid Distance Matching (CDM) module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.