A Multiplexed Network for End-to-End, Multilingual OCR

Jing Huang; Guan Pang; Rama Kovvuri; Mandy Toh; Kevin J Liang; Praveen; Krishnan; Xi Yin; Tal Hassner

arXiv:2103.15992·cs.CV·March 31, 2021

A Multiplexed Network for End-to-End, Multilingual OCR

Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen, Krishnan, Xi Yin, Tal Hassner

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified end-to-end multilingual OCR system that identifies scripts at the word level and recognizes multiple languages simultaneously, outperforming previous models on standard benchmarks.

Contribution

The novel multiplexed architecture handles multiple scripts with separate recognition heads within a single model, improving accuracy and scalability for multilingual OCR.

Findings

01

Outperforms single-head models with similar parameters

02

Achieves state-of-the-art on MLT17 and MLT19 benchmarks

03

Demonstrates effective script identification and recognition in a unified framework

Abstract

Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks, and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks. We believe that our work is a step towards the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/MultiplexedOCR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications