Joint Energy-based Detection and Classificationon of Multilingual Text Lines
Igor Milevskiy, Yuri Boykov

TL;DR
This paper introduces a hierarchical MDL-based model for joint detection and classification of multilingual text lines in images, effectively handling complex scripts and typographic variations.
Contribution
The paper presents a novel MDL-based energy model that integrates geometric and classification data costs for multilingual text detection and classification.
Findings
Robust detection and classification on a large multilingual dataset.
Effective minimization via fusion moves.
Applicable to complex scripts like Korean and Chinese.
Abstract
This paper proposes a new hierarchical MDL-based model for a joint detection and classification of multilingual text lines in im- ages taken by hand-held cameras. The majority of related text detec- tion methods assume alphabet-based writing in a single language, e.g. in Latin. They use simple clustering heuristics specific to such texts: prox- imity between letters within one line, larger distance between separate lines, etc. We are interested in a significantly more ambiguous problem where images combine alphabet and logographic characters from multiple languages and typographic rules vary a lot (e.g. English, Korean, and Chinese). Complexity of detecting and classifying text lines in multiple languages calls for a more principled approach based on information- theoretic principles. Our new MDL model includes data costs combining geometric errors with classification likelihoods and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
MethodsMinimum Description Length
