Collaborative Unlabeled Data Optimization

Xinyi Shang; Peng Sun; Fengyuan Liu; Tao Lin

arXiv:2505.14117·cs.LG·October 13, 2025

Collaborative Unlabeled Data Optimization

Xinyi Shang, Peng Sun, Fengyuan Liu, Tao Lin

PDF

Open Access

TL;DR

This paper introduces CoOpt, a data-centric framework that optimizes unlabeled data for deep learning, improving efficiency, scalability, and reusability by encoding knowledge directly into data, demonstrated through significant experimental gains.

Contribution

The paper presents CoOpt, a novel parallelized framework for collaborative unlabeled data optimization that enhances data utility and training efficiency beyond existing model-centric methods.

Findings

01

Achieved 13.6% improvement on Tiny-ImageNet

02

Achieved 6.8% improvement on ImageNet-1K

03

Speedups of 1.94x and 1.2x in training

Abstract

This paper pioneers a novel data-centric paradigm to maximize the utility of unlabeled data, tackling a critical question: How can we enhance the efficiency and sustainability of deep learning training by optimizing the data itself? We begin by identifying three key limitations in existing model-centric approaches, all rooted in a shared bottleneck: knowledge extracted from data is locked to model parameters, hindering its reusability and scalability. To this end, we propose CoOpt, a highly efficient, parallelized framework for collaborative unlabeled data optimization, thereby effectively encoding knowledge into the data itself. By distributing unlabeled data and leveraging publicly available task-agnostic models, CoOpt facilitates scalable, reusable, and sustainable training pipelines. Extensive experiments across diverse datasets and architectures demonstrate its efficacy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning