Nonsymbolic Text Representation

Hinrich Schuetze; Heike Adel; Ehsaneddin Asgari

arXiv:1610.00479·cs.CL·May 2, 2017·1 cites

Nonsymbolic Text Representation

Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari

PDF

Open Access

TL;DR

This paper presents a novel nonsymbolic text representation model that does not rely on segmentation or tokenization, outperforming previous methods in information extraction and text denoising tasks.

Contribution

It introduces the first generic nonsymbolic text representation model applicable without segmentation, advancing the capabilities of text processing.

Findings

01

Outperforms prior models in information extraction

02

Achieves better results in text denoising

03

Operates without segmentation or tokenization

Abstract

We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques