End-to-End Subtitle Detection and Recognition for Videos in East Asian   Languages via CNN Ensemble with Near-Human-Level Performance

Yan Xu; Siyuan Shan; Ziming Qiu; Zhipeng Jia; Zhengyang Shen; Yipei; Wang; Mengfei Shi; Eric I-Chao Chang

arXiv:1611.06159·cs.CV·November 27, 2017

End-to-End Subtitle Detection and Recognition for Videos in East Asian Languages via CNN Ensemble with Near-Human-Level Performance

Yan Xu, Siyuan Shan, Ziming Qiu, Zhipeng Jia, Zhengyang Shen, Yipei, Wang, Mengfei Shi, Eric I-Chao Chang

PDF

TL;DR

This paper introduces an end-to-end system for detecting and recognizing East Asian video subtitles, achieving near-human accuracy through CNN ensembles and language models, significantly outperforming existing methods.

Contribution

The paper presents a novel end-to-end subtitle detection and recognition system using CNN ensembles trained on synthetic data, with a new detection operator and language model integration.

Findings

01

Achieved 98.2% and 98.3% accuracy on Chinese videos.

02

Outperformed existing subtitle recognition methods.

03

System approaches near-human recognition performance.

Abstract

In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos in East Asian languages. Our end-to-end system consists of multiple stages. Subtitles are firstly detected by a novel image operator based on the sequence information of consecutive video frames. Then, an ensemble of Convolutional Neural Networks (CNNs) trained on synthetic data is adopted for detecting and recognizing East Asian characters. Finally, a dynamic programming approach leveraging language models is applied to constitute results of the entire body of text lines. The proposed system achieves average end-to-end accuracies of 98.2% and 98.3% on 40 videos in Simplified Chinese and 40 videos in Traditional Chinese respectively, which is a significant outperformance of other existing methods. The near-perfect accuracy of our system dramatically narrows the gap between human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.