Is Multiple Object Tracking a Matter of Specialization?
Gianluca Mancusi, Mattia Bernardi, Aniello Panariello, Angelo, Porrello, Rita Cucchiara, Simone Calderara

TL;DR
This paper introduces PASTA, a modular, parameter-efficient framework for multi-object tracking that improves domain generalization by training specialized modules for different scene attributes, outperforming monolithic models.
Contribution
The paper proposes PASTA, a novel modular architecture combining PEFT and MDL for scenario-specific tracking, enhancing generalization without increasing inference time.
Findings
Modules trained on specific scene attributes improve tracking performance.
Zero-shot evaluations show superior generalization to unseen domains.
Models and code are publicly released for reproducibility.
Abstract
End-to-end transformer-based trackers have achieved remarkable performance on most human-related datasets. However, training these trackers in heterogeneous scenarios poses significant challenges, including negative interference - where the model learns conflicting scene-specific parameters - and limited domain generalization, which often necessitates expensive fine-tuning to adapt the models to new domains. In response to these challenges, we introduce Parameter-efficient Scenario-specific Tracking Architecture (PASTA), a novel framework that combines Parameter-Efficient Fine-Tuning (PEFT) and Modular Deep Learning (MDL). Specifically, we define key scenario attributes (e.g, camera-viewpoint, lighting condition) and train specialized PEFT modules for each attribute. These expert modules are combined in parameter space, enabling systematic generalization to new domains without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCategorization, perception, and language
