TL;DR
This paper introduces a novel spatio-temporal complementary model for video text tracking that improves trajectory completeness and accuracy by leveraging continuity and semantic cues, outperforming existing methods.
Contribution
The paper proposes a new spatio-temporal complementary model with a Siamese module and semantic-visual integration, enhancing robustness and discrimination in complex video text tracking scenarios.
Findings
Achieves state-of-the-art performance on public benchmarks.
Effectively alleviates missed detections and trajectory breaks.
Improves discrimination among similar-looking text instances.
Abstract
Text tracking is to track multiple texts in a video,and construct a trajectory for each text. Existing methodstackle this task by utilizing the tracking-by-detection frame-work, i.e., detecting the text instances in each frame andassociating the corresponding text instances in consecutiveframes. We argue that the tracking accuracy of this paradigmis severely limited in more complex scenarios, e.g., owing tomotion blur, etc., the missed detection of text instances causesthe break of the text trajectory. In addition, different textinstances with similar appearance are easily confused, leadingto the incorrect association of the text instances. To this end,a novel spatio-temporal complementary text tracking model isproposed in this paper. We leverage a Siamese ComplementaryModule to fully exploit the continuity characteristic of the textinstances in the temporal dimension, which effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
