Text Localization in Video Using Multiscale Weber's Local Descriptor
B.H. Shekar, Smitha M.L.

TL;DR
This paper introduces a multiscale Weber's Local Descriptor-based method for detecting and localizing text in videos, effectively handling various text sizes, fonts, and colors through a sequence of image processing steps.
Contribution
It presents a novel multiscale WLD approach combined with morphological and connected component analysis for accurate video text localization.
Findings
Effective detection of texts of various sizes and fonts
High accuracy demonstrated on standard video datasets
Robust localization in complex video scenes
Abstract
In this paper, we propose a novel approach for detecting the text present in videos and scene images based on the Multiscale Weber's Local Descriptor (MWLD). Given an input video, the shots are identified and the key frames are extracted based on their spatio-temporal relationship. From each key frame, we detect the local region information using WLD with different radius and neighborhood relationship of pixel values and hence obtained intensity enhanced key frames at multiple scales. These multiscale WLD key frames are merged together and then the horizontal gradients are computed using morphological operations. The obtained results are then binarized and the false positives are eliminated based on geometrical properties. Finally, we employ connected component analysis and morphological dilation operation to determine the text regions that aids in text localization. The experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
