Collaborative Unlabeled Data Optimization
Xinyi Shang, Peng Sun, Fengyuan Liu, Tao Lin

TL;DR
This paper introduces CoOpt, a data-centric framework that optimizes unlabeled data for deep learning, improving efficiency, scalability, and reusability by encoding knowledge directly into data, demonstrated through significant experimental gains.
Contribution
The paper presents CoOpt, a novel parallelized framework for collaborative unlabeled data optimization that enhances data utility and training efficiency beyond existing model-centric methods.
Findings
Achieved 13.6% improvement on Tiny-ImageNet
Achieved 6.8% improvement on ImageNet-1K
Speedups of 1.94x and 1.2x in training
Abstract
This paper pioneers a novel data-centric paradigm to maximize the utility of unlabeled data, tackling a critical question: How can we enhance the efficiency and sustainability of deep learning training by optimizing the data itself? We begin by identifying three key limitations in existing model-centric approaches, all rooted in a shared bottleneck: knowledge extracted from data is locked to model parameters, hindering its reusability and scalability. To this end, we propose CoOpt, a highly efficient, parallelized framework for collaborative unlabeled data optimization, thereby effectively encoding knowledge into the data itself. By distributing unlabeled data and leveraging publicly available task-agnostic models, CoOpt facilitates scalable, reusable, and sustainable training pipelines. Extensive experiments across diverse datasets and architectures demonstrate its efficacy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
