SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim, Yoonsik Kim, DongHyun Kim, Yumin Lim, Geewook Kim, Taeho, Kil

TL;DR
SCOB is a novel pre-training method that uses character-wise supervised contrastive learning with online text rendering to improve universal text understanding across diverse document and scene text domains, reducing annotation costs and bridging domain gaps.
Contribution
The paper introduces SCOB, a new pre-training approach that enhances cross-domain text understanding by leveraging contrastive learning and online text rendering, enabling weakly supervised learning.
Findings
SCOB improves performance over vanilla pre-training methods.
SCOB achieves comparable results to state-of-the-art techniques.
SCOB effectively bridges domain gaps in text understanding.
Abstract
Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. Among them, pre-training that reads all text from an image has shown promise, but often exhibits instability and even fails when applied to broader domains, such as those involving both visual documents and scene text images. This is a substantial limitation for real-world scenarios, where the processing of text image inputs in diverse domains is essential. In this paper, we investigate effective pre-training tasks in the broader domains and also propose a novel pre-training method called SCOB that leverages character-wise supervised contrastive learning with online text rendering to effectively pre-train document and scene text domains by bridging the domain gap. Moreover, SCOB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
