TransMOT: Spatial-Temporal Graph Transformer for Multiple Object   Tracking

Peng Chu; Jiang Wang; Quanzeng You; Haibin Ling; Zicheng Liu

arXiv:2104.00194·cs.CV·April 6, 2021

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

PDF

1 Video

TL;DR

TransMOT introduces a graph transformer-based approach for multiple object tracking that models spatial-temporal interactions efficiently, achieving state-of-the-art accuracy while reducing computational costs.

Contribution

The paper presents a novel graph transformer architecture for MOT, incorporating sparse graphs and a cascade association framework for improved speed and accuracy.

Findings

01

Achieves state-of-the-art performance on MOT datasets.

02

More computationally efficient than traditional Transformers.

03

Effectively handles low-score detections and occlusions.

Abstract

Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose a solution named TransMOT, which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects. TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs. TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy. To further improve the tracking speed and accuracy, we propose a cascade association framework to handle low-score detections and long-term occlusions that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking· youtube

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Attention Is All You Need · Dropout · Residual Connection · Laplacian EigenMap · Byte Pair Encoding