Data Collection and Labeling Techniques for Machine Learning

Qianyu Huang; Tongfang Zhao

arXiv:2407.12793·cs.DB·July 19, 2024·3 cites

Data Collection and Labeling Techniques for Machine Learning

Qianyu Huang, Tongfang Zhao

PDF

Open Access

TL;DR

This paper reviews current data collection and labeling techniques for machine learning, emphasizing their importance, challenges, and future research directions to improve scalability and efficiency.

Contribution

It offers a comprehensive overview integrating machine learning and data management perspectives, highlighting recent advances and identifying gaps for future research.

Findings

01

Survey of state-of-the-art data collection methods

02

Analysis of data labeling techniques and challenges

03

Identification of future research directions

Abstract

Data collection and labeling are critical bottlenecks in the deployment of machine learning applications. With the increasing complexity and diversity of applications, the need for efficient and scalable data collection and labeling techniques has become paramount. This paper provides a review of the state-of-the-art methods in data collection, data labeling, and the improvement of existing data and models. By integrating perspectives from both the machine learning and data management communities, we aim to provide a holistic view of the current landscape and identify future research directions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification