DataWords: Getting Contrarian with Text, Structured Data and Explanations
Stephen I. Gallant, Mirza Nasir Hossain

TL;DR
This paper introduces DataWords, a method to combine free-text and structured data into a unified text-based representation, enabling improved classification and explainability using text-modeling algorithms.
Contribution
It proposes representing structured data as text sentences, DataWords, to integrate structured and free-text data for classification and explanation purposes.
Findings
Improved text classification performance with DataWords.
Enhanced explanations combining text and structured data.
Effective integration of structured data into text models.
Abstract
Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This permits modeling a mixture of text and structured data by using only text-modeling algorithms. Several examples illustrate that it is possible to improve text classification performance by first running extraction tools (named entity recognition), then converting the output to DataWords, and adding the DataWords to the original text -- before model building and classification. This approach also allows us to produce explanations for inferences in terms of both free text and structured data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques
