Extracting textual overlays from social media videos using neural networks
Adam S{\l}ucki, Tomasz Trzcinski, Adam Bielski, Pawe{\l} Cyrta

TL;DR
This paper introduces a neural network-based method for extracting textual overlays from social media videos, enhancing content analysis by combining keyframe extraction, text detection, and recognition with synthetic data augmentation.
Contribution
The work presents a novel neural network architecture for text recognition in videos, utilizing synthetic datasets and filtering techniques to improve accuracy to over 80%.
Findings
Achieved over 80% accuracy in overlay text extraction.
Synthetic dataset of 600,000 images improved recognition performance.
Filtering method reduced overlapping text errors.
Abstract
Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed in the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Digital Media Forensic Detection
