Weakly Supervised Multi-Object Tracking and Segmentation
Idoia Ruiz, Lorenzo Porzi, Samuel Rota Bul\`o, Peter Kontschieder,, Joan Serrat

TL;DR
This paper proposes a weakly supervised approach for multi-object tracking and segmentation that leverages multi-task learning and Grad-CAM heatmaps to reduce annotation requirements while maintaining competitive performance.
Contribution
It introduces a novel training strategy combining classification and tracking with weak localization cues for joint segmentation and tracking without mask annotations.
Findings
Achieves near-supervised performance on KITTI MOTS benchmark.
Reduces performance gap to 12% on MOTSP metric for cars and pedestrians.
Demonstrates effectiveness of weakly supervised learning in complex vision tasks.
Abstract
We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation. To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
