Representing text as abstract images enables image classifiers to also simultaneously classify text
Stephen M. Petrie, T'Mir D. Julius

TL;DR
This paper presents a novel method to convert text data into abstract images, enabling the use of image classifiers for text comparison tasks like entity disambiguation, achieving high accuracy in patent inventor name matching.
Contribution
The paper introduces a new text-to-image representation technique that allows image classification models to be applied to text comparison problems, demonstrating its effectiveness in patent data.
Findings
High accuracy in inventor name disambiguation
Effective use of image classifiers on text comparison tasks
Potential applicability to broader NLP problems
Abstract
We introduce a novel method for converting text data into abstract image representations, which allows image-based processing techniques (e.g. image classification networks) to be applied to text-based comparison problems. We apply the technique to entity disambiguation of inventor names in US patents. The method involves converting text from each pairwise comparison between two inventor name records into a 2D RGB (stacked) image representation. We then train an image classification neural network to discriminate between such pairwise comparison images, and use the trained network to label each pair of records as either matched (same inventor) or non-matched (different inventors), obtaining highly accurate results. Our new text-to-image representation method could also be used more broadly for other NLP comparison problems, such as disambiguation of academic publications, or for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Biomedical Text Mining and Ontologies
