GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

Haibin He; Jing Zhang; Maoyuan Ye; Juhua Liu; Bo Du; Dacheng Tao

arXiv:2505.22228·cs.CV·January 1, 2026

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

Haibin He, Jing Zhang, Maoyuan Ye, Juhua Liu, Bo Du, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

GoMatching++ introduces a parameter- and data-efficient method that transforms existing image text spotters into effective video text spotters, achieving high performance with minimal training data and costs.

Contribution

The paper proposes a novel approach that freezes image text spotters and adds a lightweight tracker, along with a rescoring mechanism and LST-Matcher, to efficiently adapt to video text spotting.

Findings

01

Sets new benchmarks on ICDAR15-video, DSText, and BOVText.

02

Significantly reduces training data and computational costs.

03

Introduces ArTVideo, a new curved text video benchmark.

Abstract

Video text spotting (VTS) extends image text spotting (ITS) by adding text tracking, significantly increasing task complexity. Despite progress in VTS, existing methods still fall short of the performance seen in ITS. This paper identifies a key limitation in current video text spotters: limited recognition capability, even after extensive end-to-end training. To address this, we propose GoMatching++, a parameter- and data-efficient method that transforms an off-the-shelf image text spotter into a video specialist. The core idea lies in freezing the image text spotter and introducing a lightweight, trainable tracker, which can be optimized efficiently with minimal training data. Our approach includes two key components: (1) a rescoring mechanism to bridge the domain gap between image and video data, and (2) the LST-Matcher, which enhances the frozen image text spotter's ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hxyz-123/gomatching
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques