SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels
Martin Ki\v{s}\v{s}, Michal Hradi\v{s}, Karel Bene\v{s}, Petr Buchal,, Michal Kula

TL;DR
SoftCTC introduces a novel semi-supervised learning loss for sequence tasks that considers multiple transcription variants simultaneously, eliminating the need for confidence filtering and improving efficiency.
Contribution
The paper proposes SoftCTC, a new loss function for semi-supervised sequence learning that handles multiple transcriptions without confidence filtering, enhancing efficiency and performance.
Findings
SoftCTC matches the performance of filtered pipelines in handwriting recognition.
It is significantly more computationally efficient than naive CTC approaches.
The GPU implementation is publicly available.
Abstract
This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function SoftCTC which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a na\"ive CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies
