UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Bishoy Galoaa, Xiangyu Bai, Utsav Nandi, Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas

TL;DR
UniTrack introduces a universal, differentiable graph-based loss for multi-object tracking that improves existing models by optimizing tracking-specific objectives end-to-end.
Contribution
It provides a novel, plug-and-play graph-theoretic loss function that enhances multi-object tracking performance without modifying existing architectures.
Findings
Up to 53% reduction in identity switches
12% improvement in IDF1 score
9.7% MOTA increase on SportsMOT
Abstract
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements…
Peer Reviews
Decision·ICLR 2026 Poster
1. One of the biggest advantages is that the proposed method is architecture‑agnostic: it demonstrates improvements when plugged into diverse families (end-to-end transformers, joint detection-tracking, tracking-by-detection, global transformers). 2. The proposed unified objective makes sense as it merges detection quality and identity preservation, and benefit the end-to-end MOT training. 3. The authors show Clear ablation on error types (Table 3), clearly presenting which term combats which
1. The details of the differentiability of the flow term are not clearly conveyed. The loss scales by factors that depend on false positives/false negatives, but the paper does not define a differentiable surrogate for those counts. As far as i understand that derivation treats the FP/FN counts inside the loss as if they were constants and never explains how those counts are made differentiable with respect to the model outputs. In practice, FP/FN are discrete functions of predictions (they jump
1. The idea of using a plug-and-ply graph-based loss makes sense. 2. The implementation of the graph-based loss is suitable for the MOT task. 3. The analysis of the loss is reasonable. 4. The loss is effective with different trackers on multiple benchmarks, which shows the universality of the proposed loss. Overall, this work develops a loss which has both solid theoretic foundation and obvious improvement in practice. I believe this work will benefit the community.
The weights of spatial and temporal loss are adaptive to the graph connectivity. It is encouraged to compare with other solutions, like adaptive parameters directly learned by the network, and fixed parameters.
1. UniTrack combines detection, identity, and temporal consistency into a single differentiable loss. 2. UniTrack can be applied to tracking architectures without network modification, demonstrating practical flexibility. 3. Experiments across MOT17, MOT20, SportsMOT, and DanceTrack show consistent gains (up to +9.7% MOTA, +12.3% IDF1) and reduced identity switches, highlighting the effectiveness and generality of the proposed framework.
1. The method introduces more training complexity and memory overhead; scalability to large scenes or dense MOT scenarios remains a concern. 2. The authors are suggested to add more ablation studies on the adaptive Laplacian weighting. 3. The ablation section focuses mainly on component removal but could include more fine-grained analysis of hyperparameters (e.g., λs, λt updates, thresholding strategies). 4. It is recommended that the authors compare UniTrack with more recent MOT approaches, esp
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition
