DMOps: Data Management Operation and Recipes

Eujeong Choi; Chanjun Park

arXiv:2301.01228·cs.DB·June 27, 2023·1 cites

DMOps: Data Management Operation and Recipes

Eujeong Choi, Chanjun Park

PDF

Open Access

TL;DR

This paper introduces DMOps, a framework derived from real-world NLP data management experiences, to guide industry in optimizing dataset creation and streamlining data operations for NLP products.

Contribution

It proposes the DMOps framework, providing practical recipes and baseline practices for effective NLP data management in industry settings.

Findings

01

DMOps offers a structured approach to NLP data management.

02

The framework is based on real-world industry experiences.

03

It aims to improve efficiency in dataset building for NLP.

Abstract

Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline. Recognizing its significance, academia, industry, and government departments have suggested various NLP data research initiatives. While the ability to utilize existing data is essential, the ability to build a dataset has become more critical than ever, especially in the industry. In consideration of this trend, we propose a "Data Management Operations and Recipes" to guide the industry in optimizing the building of datasets for NLP products. This paper presents the concept of DMOps which is derived from real-world experiences with NLP data management and aims to streamline data operations by offering a baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Data Quality and Management · Data Stream Mining Techniques