TL;DR
This paper presents a two-stage learning approach using a Siamese CNN and gradient boosting to improve data association in pedestrian tracking, achieving state-of-the-art results on public datasets.
Contribution
Introduces a novel two-stage learning scheme combining Siamese CNN and gradient boosting for robust target association in tracking.
Findings
Outperforms complex models with a simple tracker using learned matching probabilities.
Achieves state-of-the-art performance on multiple pedestrian tracking datasets.
Effective integration of spatio-temporal features and contextual information.
Abstract
This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches, aggregating pixel values and optical flow information. Second, a set of contextual features derived from the position and size of the compared input patches are combined with the CNN output by means of a gradient boosting classifier to generate the final matching probability. This learning approach is validated by using a linear programming based multi-person tracker showing that even a simple and efficient tracker may outperform much more complex models when fed with our learned matching probabilities. Results on publicly available sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Learning by Tracking: Siamese CNN for Robust Target Association· youtube
