SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft   Pseudo-Labels

Martin Ki\v{s}\v{s}; Michal Hradi\v{s}; Karel Bene\v{s}; Petr Buchal,; Michal Kula

arXiv:2212.02135·cs.LG·September 20, 2023·1 cites

SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

Martin Ki\v{s}\v{s}, Michal Hradi\v{s}, Karel Bene\v{s}, Petr Buchal,, Michal Kula

PDF

Open Access 1 Repo

TL;DR

SoftCTC introduces a novel semi-supervised learning loss for sequence tasks that considers multiple transcription variants simultaneously, eliminating the need for confidence filtering and improving efficiency.

Contribution

The paper proposes SoftCTC, a new loss function for semi-supervised sequence learning that handles multiple transcriptions without confidence filtering, enhancing efficiency and performance.

Findings

01

SoftCTC matches the performance of filtered pipelines in handwriting recognition.

02

It is significantly more computationally efficient than naive CTC approaches.

03

The GPU implementation is publicly available.

Abstract

This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode x 2013$ SoftCTC $\unicode x 2013$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dcgm/softctc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies