Shuffler: A Large Scale Data Management Tool for ML in Computer Vision

Evgeny Toropov; Paola A. Buitrago; Jose M. F. Moura

arXiv:2104.05125·cs.CV·April 13, 2021

Shuffler: A Large Scale Data Management Tool for ML in Computer Vision

Evgeny Toropov, Paola A. Buitrago, Jose M. F. Moura

PDF

1 Repo

TL;DR

Shuffler is an open-source data management tool designed for flexible manipulation of large computer vision datasets, supporting over 40 operations and easy extensibility to enhance ML workflows.

Contribution

It introduces a comprehensive, relational database-based framework for managing and manipulating computer vision datasets throughout ML pipelines, addressing a gap in existing tools.

Findings

01

Supports over 40 data handling operations

02

Compatible with major computer vision datasets

03

Easily extensible for new operations and datasets

Abstract

Datasets in the computer vision academic research community are primarily static. Once a dataset is accepted as a benchmark for a computer vision task, researchers working on this task will not alter it in order to make their results reproducible. At the same time, when exploring new tasks and new applications, datasets tend to be an ever changing entity. A practitioner may combine existing public datasets, filter images or objects in them, change annotations or add new ones to fit a task at hand, visualize sample images, or perhaps output statistics in the form of text or plots. In fact, datasets change as practitioners experiment with data as much as with algorithms, trying to make the most out of machine learning models. Given that ML and deep learning call for large volumes of data to produce satisfactory results, it is no surprise that the resulting data and software management…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kukuruza/shuffler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.