TL;DR
The paper introduces cleanNLP, a fast R package that converts textual data into normalized tables using Stanford's CoreNLP, supporting multiple languages and various NLP annotation tasks.
Contribution
It provides a unified, efficient data model for NLP tasks in R, integrating multiple annotation tools into a single pipeline for multilingual text processing.
Findings
Supports English, French, German, and Spanish.
Includes tokenization, POS tagging, NER, sentiment analysis, and more.
Enables streamlined NLP data analysis in R.
Abstract
The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
