Multi-language Video Subtitle Dataset for Image-based Text Recognition
Thanadol Singkhornart, Olarik Surinta

TL;DR
This paper introduces a comprehensive multilingual video subtitle dataset with diverse characters and challenging backgrounds, aimed at advancing text recognition research in videos.
Contribution
It provides a large, diverse dataset of 4,224 subtitle images across multiple languages, supporting development of robust multilingual text recognition models.
Findings
Enhanced training data for multilingual OCR models
Improved accuracy in recognizing complex subtitle backgrounds
Facilitates research in deep learning for video text recognition
Abstract
The Multi-language Video Subtitle Dataset is a comprehensive collection designed to support research in text recognition across multiple languages. This dataset includes 4,224 subtitle images extracted from 24 videos sourced from online platforms. It features a wide variety of characters, including Thai consonants, vowels, tone marks, punctuation marks, numerals, Roman characters, and Arabic numerals. With 157 unique characters, the dataset provides a resource for addressing challenges in text recognition within complex backgrounds. It addresses the growing need for high-quality, multilingual text recognition data, particularly as videos with embedded subtitles become increasingly dominant on platforms like YouTube and Facebook. The variability in text length, font, and placement within these images adds complexity, offering a valuable resource for developing and evaluating deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
