Improving Text Relationship Modeling with Artificial Data

Peter Organisciak; Maggie Ryan

arXiv:2010.14640·cs.DL·August 25, 2022

Improving Text Relationship Modeling with Artificial Data

Peter Organisciak, Maggie Ryan

PDF

Open Access

TL;DR

This paper demonstrates that synthetic data generation significantly enhances the performance of deep neural network classifiers in digital library relationship classification, especially for whole-part relationships, and explores learning new relationship classes from artificial data.

Contribution

It introduces a synthetic data approach for relationship classification in digital libraries and evaluates its effectiveness, showing substantial performance improvements.

Findings

01

Synthetic data improves classifier accuracy by 91% for whole-part relationships.

02

Artificial data enables learning of new relationship classes.

03

Synthetic data supports robust supervised learning with limited labeled data.

Abstract

Data augmentation uses artificially-created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. We find that for classification on whole-part relationships between books, synthetic data improves a deep neural network classifier by 91%. Further, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Music and Audio Processing · Semantic Web and Ontologies