UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking

Bishoy Galoaa; Xiangyu Bai; Utsav Nandi; Sai Siddhartha Vivek Dhir Rangoju; Somaieh Amraee; Sarah Ostadabbas

arXiv:2602.05037·cs.CV·February 6, 2026

UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking

Bishoy Galoaa, Xiangyu Bai, Utsav Nandi, Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas

PDF

Open Access 3 Reviews

TL;DR

UniTrack introduces a universal, differentiable graph-based loss for multi-object tracking that improves existing models by optimizing tracking-specific objectives end-to-end.

Contribution

It provides a novel, plug-and-play graph-theoretic loss function that enhances multi-object tracking performance without modifying existing architectures.

Findings

01

Up to 53% reduction in identity switches

02

12% improvement in IDF1 score

03

9.7% MOTA increase on SportsMOT

Abstract

We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. One of the biggest advantages is that the proposed method is architecture‑agnostic: it demonstrates improvements when plugged into diverse families (end-to-end transformers, joint detection-tracking, tracking-by-detection, global transformers). 2. The proposed unified objective makes sense as it merges detection quality and identity preservation, and benefit the end-to-end MOT training. 3. The authors show Clear ablation on error types (Table 3), clearly presenting which term combats which

Weaknesses

1. The details of the differentiability of the flow term are not clearly conveyed. The loss scales by factors that depend on false positives/false negatives, but the paper does not define a differentiable surrogate for those counts. As far as i understand that derivation treats the FP/FN counts inside the loss as if they were constants and never explains how those counts are made differentiable with respect to the model outputs. In practice, FP/FN are discrete functions of predictions (they jump

Reviewer 02Rating 8Confidence 4

Strengths

1. The idea of using a plug-and-ply graph-based loss makes sense. 2. The implementation of the graph-based loss is suitable for the MOT task. 3. The analysis of the loss is reasonable. 4. The loss is effective with different trackers on multiple benchmarks, which shows the universality of the proposed loss. Overall, this work develops a loss which has both solid theoretic foundation and obvious improvement in practice. I believe this work will benefit the community.

Weaknesses

The weights of spatial and temporal loss are adaptive to the graph connectivity. It is encouraged to compare with other solutions, like adaptive parameters directly learned by the network, and fixed parameters.

Reviewer 03Rating 6Confidence 3

Strengths

1. UniTrack combines detection, identity, and temporal consistency into a single differentiable loss. 2. UniTrack can be applied to tracking architectures without network modification, demonstrating practical flexibility. 3. Experiments across MOT17, MOT20, SportsMOT, and DanceTrack show consistent gains (up to +9.7% MOTA, +12.3% IDF1) and reduced identity switches, highlighting the effectiveness and generality of the proposed framework.

Weaknesses

1. The method introduces more training complexity and memory overhead; scalability to large scenes or dense MOT scenarios remains a concern. 2. The authors are suggested to add more ablation studies on the adaptive Laplacian weighting. 3. The ablation section focuses mainly on component removal but could include more fine-grained analysis of hyperparameters (e.g., λs, λt updates, thresholding strategies). 4. It is recommended that the authors compare UniTrack with more recent MOT approaches, esp

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition