NENET: An Edge Learnable Network for Link Prediction in Scene Text

Mayank Kumar Singh; Sayan Banerjee; Shubhasis Chaudhuri

arXiv:2005.12147·cs.LG·May 26, 2020·1 cites

NENET: An Edge Learnable Network for Link Prediction in Scene Text

Mayank Kumar Singh, Sayan Banerjee, Shubhasis Chaudhuri

PDF

Open Access

TL;DR

This paper introduces NENET, a graph neural network designed for link prediction in scene text detection, effectively connecting characters regardless of spatial separation or orientation, and achieves top results on SynthText.

Contribution

The paper proposes a novel GNN architecture that learns both node and edge features for linking characters in scene text detection, improving over existing methods.

Findings

01

Achieves top performance on SynthText dataset

02

Effectively links spatially separated characters

03

Handles arbitrary character orientations

Abstract

Text detection in scenes based on deep neural networks have shown promising results. Instead of using word bounding box regression, recent state-of-the-art methods have started focusing on character bounding box and pixel-level prediction. This necessitates the need to link adjacent characters, which we propose in this paper using a novel Graph Neural Network (GNN) architecture that allows us to learn both node and edge features as opposed to only the node features under the typical GNN. The main advantage of using GNN for link prediction lies in its ability to connect characters which are spatially separated and have an arbitrary orientation. We show our concept on the well known SynthText dataset, achieving top results as compared to state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsGraph Neural Network