Evaluating and Crafting Datasets Effective for Deep Learning With Data   Maps

Jay Bishnu; Andrew Gondoputro

arXiv:2208.10033·cs.LG·October 25, 2022·1 cites

Evaluating and Crafting Datasets Effective for Deep Learning With Data Maps

Jay Bishnu, Andrew Gondoputro

PDF

Open Access

TL;DR

This paper proposes a method for creating smaller, high-quality datasets for deep learning by selecting samples based on their difficulty level, maintaining model accuracy while reducing resource requirements.

Contribution

It introduces a novel dataset curation approach that focuses on sample difficulty to optimize training efficiency and effectiveness.

Findings

01

Smaller datasets curated by difficulty can match large dataset performance.

02

The method reduces training time and resource consumption.

03

Improves dataset quality assessment for deep learning models.

Abstract

Rapid development in deep learning model construction has prompted an increased need for appropriate training data. The popularity of large datasets - sometimes known as "big data" - has diverted attention from assessing their quality. Training on large datasets often requires excessive system resources and an infeasible amount of time. Furthermore, the supervised machine learning process has yet to be fully automated: for supervised learning, large datasets require more time for manually labeling samples. We propose a method of curating smaller datasets with comparable out-of-distribution model accuracy after an initial training session using an appropriate distribution of samples classified by how difficult it is for a model to learn from them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Time Series Analysis and Forecasting · Image Processing and 3D Reconstruction