NENET: An Edge Learnable Network for Link Prediction in Scene Text
Mayank Kumar Singh, Sayan Banerjee, Shubhasis Chaudhuri

TL;DR
This paper introduces NENET, a graph neural network designed for link prediction in scene text detection, effectively connecting characters regardless of spatial separation or orientation, and achieves top results on SynthText.
Contribution
The paper proposes a novel GNN architecture that learns both node and edge features for linking characters in scene text detection, improving over existing methods.
Findings
Achieves top performance on SynthText dataset
Effectively links spatially separated characters
Handles arbitrary character orientations
Abstract
Text detection in scenes based on deep neural networks have shown promising results. Instead of using word bounding box regression, recent state-of-the-art methods have started focusing on character bounding box and pixel-level prediction. This necessitates the need to link adjacent characters, which we propose in this paper using a novel Graph Neural Network (GNN) architecture that allows us to learn both node and edge features as opposed to only the node features under the typical GNN. The main advantage of using GNN for link prediction lies in its ability to connect characters which are spatially separated and have an arbitrary orientation. We show our concept on the well known SynthText dataset, achieving top results as compared to state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsGraph Neural Network
