DataWords: Getting Contrarian with Text, Structured Data and   Explanations

Stephen I. Gallant; Mirza Nasir Hossain

arXiv:2111.05384·cs.LG·February 18, 2022

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Stephen I. Gallant, Mirza Nasir Hossain

PDF

Open Access

TL;DR

This paper introduces DataWords, a method to combine free-text and structured data into a unified text-based representation, enabling improved classification and explainability using text-modeling algorithms.

Contribution

It proposes representing structured data as text sentences, DataWords, to integrate structured and free-text data for classification and explanation purposes.

Findings

01

Improved text classification performance with DataWords.

02

Enhanced explanations combining text and structured data.

03

Effective integration of structured data into text models.

Abstract

Our goal is to build classification models using a combination of free-text and structured data. To do this, we represent structured data by text sentences, DataWords, so that similar data items are mapped into the same sentence. This permits modeling a mixture of text and structured data by using only text-modeling algorithms. Several examples illustrate that it is possible to improve text classification performance by first running extraction tools (named entity recognition), then converting the output to DataWords, and adding the DataWords to the original text -- before model building and classification. This approach also allows us to produce explanations for inferences in terms of both free text and structured data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques