Data-to-Value: An Evaluation-First Methodology for Natural Language Projects
Jochen L. Leidner

TL;DR
The paper introduces 'Data to Value' (D2V), a new evaluation-first methodology tailored for large-scale natural language processing projects, addressing scalability, unstructured data, and non-technical factors.
Contribution
It presents a novel methodology specifically designed for big data NLP projects, filling gaps left by traditional data mining methodologies.
Findings
D2V improves project success rates in NLP at scale.
The methodology incorporates a comprehensive question catalog for better project guidance.
It bridges technical and non-technical project aspects effectively.
Abstract
Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). To address these shortcomings, a new methodology, called "Data to Value" (D2V), is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of big data text analytics project…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Big Data and Business Intelligence
