FastTracker: Real-Time and Accurate Visual Tracking
Hamidreza Hashempoor, Yu Dong Hwang

TL;DR
FastTracker introduces a generalized real-time tracking framework that effectively handles multiple object types, especially vehicles, by incorporating occlusion-aware re-identification and scene-aware trajectory refinement, supported by a new diverse vehicle dataset.
Contribution
The paper presents a novel multi-object tracking method with occlusion and scene priors, and introduces a new vehicle tracking benchmark dataset.
Findings
Achieves high HOTA scores of 66.4 on MOT17 and 65.7 on MOT20.
Demonstrates robustness across diverse datasets and object categories.
Outperforms existing methods in vehicle tracking accuracy.
Abstract
Conventional multi-object tracking (MOT) systems are predominantly designed for pedestrian tracking and often exhibit limited generalization to other object categories. This paper presents a generalized tracking framework capable of handling multiple object types, with a particular emphasis on vehicle tracking in complex traffic scenes. The proposed method incorporates two key components: (1) an occlusion-aware re-identification mechanism that enhances identity preservation for heavily occluded objects, and (2) a road-structure-aware tracklet refinement strategy that utilizes semantic scene priors such as lane directions, crosswalks, and road boundaries to improve trajectory continuity and accuracy. In addition, we introduce a new benchmark dataset comprising diverse vehicle classes with frame-level tracking annotations, specifically curated to support evaluation of vehicle-focused…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The idea of handing occlusion situations and utilizing scene information makes sense. 2. The implementation of the occlusion handling and the scene prior constraints is reasonable. 3. The collected dataset can be helpful for the community. 4. The proposed method is effective on public benchmarks and the new one.
1. Experiments on efficiency are lacking. The paper claims that the tracker is real-time, but related experiments, like running speed, computational burden, etc., are missing. These experiments are needed to support the claim. 2. Methodology contribution is limited. This work focuses on the Kalman Filter-based data association process, and improves the process from multiple aspects. However, these improvements are more like engineering optimization, instead of methodology contribution from my vi
1. Clear problem motivation: addresses generalization beyond pedestrian tracking and the need for multi-class vehicle-centric tracking under occlusions and complex layouts. 2. Practical, lightweight design: avoids deep appearance models in the online pipeline; relies on motion, geometry, and simple heuristics that are attractive for real-time deployments. 3. Environment-aware modeling: novel use of region semantics and directional constraints to limit drift and enforce plausible motion without h
1. Clarity and correctness of some definitions: The “center-proximity score CP” is described as computed via IoU, which is conceptually inconsistent (center-proximity is not IoU). A precise definition is missing. 2. Occlusion handling design details: Marking occlusion based on overlap with other active tracklets via a single threshold may conflate crowding with occlusion and induce false occlusion states. 3. Dataset details and release: The FastTrack dataset has only 12 videos (albeit very dense
- Comprehensive Experiments: The authors evaluate on a wide range of benchmarks (MOT16/17/20, DanceTrack, and FastTrack) with detailed ablation studies (Tables 2–5) verifying each module’s impact. The improvements in key metrics (MOTA, HOTA, IDF1) and especially the reduction in ID switches (e.g. lowest IDs on MOT17/20) are convincingly shown. - Strong Empirical Results: FastTracker achieves state-of-the-art or competitive performance on multiple datasets. For instance, it reaches HOTA 66.4 on
- Manual ROI/Direction Constraints: A key limitation is the reliance on manually defined polygonal ROIs and fixed cone directions for scene priors. As acknowledged by the authors, this is labor-intensive and may not generalize to complex or evolving environments (e.g. intersections, roundabouts). The current system only supports quadrilateral regions, limiting flexibility. This reliance diminishes the novelty and practicality of the road-structure module. - Lack of Runtime Analysis: The claim o
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Computing and Algorithms · Gaze Tracking and Assistive Technology
