Scalable Mask Annotation for Video Text Spotting

Haibin He; Jing Zhang; Mengyang Xu; Juhua Liu; Bo Du; Dacheng Tao

arXiv:2305.01443·cs.CV·May 3, 2023·6 cites

Scalable Mask Annotation for Video Text Spotting

Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAMText, a scalable mask annotation pipeline for video text spotting, creating a large dataset with over 9 million mask annotations to improve text localization and recognition in videos.

Contribution

The paper presents SAMText, a novel scalable annotation method using the SAM model, and releases SAMText-9M, a large dataset with detailed mask annotations for video text spotting.

Findings

01

Generated over 9 million mask annotations for video frames.

02

Provided a thorough analysis of mask quality and dataset statistics.

03

Enabled new research directions in text boundary detection.

Abstract

Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames. However, current datasets available for this task rely on quadrilateral ground truth annotations, which may result in including excessive background content and inaccurate text boundaries. Furthermore, methods trained on these datasets often produce prediction results in the form of quadrilateral boxes, which limits their ability to handle complex scenarios such as dense or curved text. To address these issues, we propose a scalable mask annotation pipeline called SAMText for video text spotting. SAMText leverages the SAM model to generate mask annotations for scene text images or video frames at scale. Using SAMText, we have created a large-scale dataset, SAMText-9M, that contains over 2,400 video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vitae-transformer/samtext
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Multimodal Machine Learning Applications

MethodsSegment Anything Model