LOGO: Video Text Spotting with Language Collaboration and Glyph   Perception Model

Hongen Liu; Di Sun; Jiahao Wang; Yi Liu; Gang Pan

arXiv:2405.19194·cs.CV·June 13, 2024·1 cites

LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model

Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

PDF

Open Access

TL;DR

The paper introduces LOGO, a novel video text spotting framework that combines language collaboration and glyph perception to improve detection, recognition, and tracking of text in videos, especially under challenging conditions.

Contribution

LOGO integrates a language synergy classifier and glyph supervision into existing text spotters, enhancing low-resolution text detection and recognition accuracy without extensive fine-tuning.

Findings

01

Improves detection and recognition of low-resolution text instances.

02

Effectively filters out text-like background regions.

03

Achieves state-of-the-art performance on public benchmarks.

Abstract

Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited tracking trajectories on extreme dataset. Fine-tuning transformer-based text spotters on specific datasets could yield performance enhancements, albeit at the expense of considerable training resources. In this paper, we propose a Language Collaboration and Glyph Perception Model, termed LOGO, an innovative framework designed to enhance the performance of conventional text spotters. To achieve this goal, we design a language synergy classifier (LSC) to explicitly discern text instances from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Human Motion and Animation · Multimodal Machine Learning Applications