Video Text Tracking With a Spatio-Temporal Complementary Model

Yuzhe Gao; Xing Li; Jiajian Zhang; Yu Zhou; Dian Jin; Jing Wang,; Shenggao Zhu; and Xiang Bai

arXiv:2111.04987·cs.CV·December 30, 2021

Video Text Tracking With a Spatio-Temporal Complementary Model

Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Jing Wang,, Shenggao Zhu, and Xiang Bai

PDF

1 Repo

TL;DR

This paper introduces a novel spatio-temporal complementary model for video text tracking that improves trajectory completeness and accuracy by leveraging continuity and semantic cues, outperforming existing methods.

Contribution

The paper proposes a new spatio-temporal complementary model with a Siamese module and semantic-visual integration, enhancing robustness and discrimination in complex video text tracking scenarios.

Findings

01

Achieves state-of-the-art performance on public benchmarks.

02

Effectively alleviates missed detections and trajectory breaks.

03

Improves discrimination among similar-looking text instances.

Abstract

Text tracking is to track multiple texts in a video,and construct a trajectory for each text. Existing methodstackle this task by utilizing the tracking-by-detection frame-work, i.e., detecting the text instances in each frame andassociating the corresponding text instances in consecutiveframes. We argue that the tracking accuracy of this paradigmis severely limited in more complex scenarios, e.g., owing tomotion blur, etc., the missed detection of text instances causesthe break of the text trajectory. In addition, different textinstances with similar appearance are easily confused, leadingto the incorrect association of the text instances. To this end,a novel spatio-temporal complementary text tracking model isproposed in this paper. We leverage a Siamese ComplementaryModule to fully exploit the continuity characteristic of the textinstances in the temporal dimension, which effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lsabrinax/videotextscm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.