Nonsymbolic Text Representation
Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari

TL;DR
This paper presents a novel nonsymbolic text representation model that does not rely on segmentation or tokenization, outperforming previous methods in information extraction and text denoising tasks.
Contribution
It introduces the first generic nonsymbolic text representation model applicable without segmentation, advancing the capabilities of text processing.
Findings
Outperforms prior models in information extraction
Achieves better results in text denoising
Operates without segmentation or tokenization
Abstract
We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
