Efficiency for Free: Ideal Data Are Transportable Representations

Peng Sun; Yi Jiang; Tao Lin

arXiv:2405.14669·cs.LG·November 4, 2024

Efficiency for Free: Ideal Data Are Transportable Representations

Peng Sun, Yi Jiang, Tao Lin

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that using a publicly available, task-agnostic prior model can generate efficient data for training, significantly reducing computational costs while maintaining accuracy in representation learning.

Contribution

It introduces the concept of task-agnostic prior models to produce efficient data, enabling faster and cost-effective representation learning.

Findings

01

Using a ResNet-18 prior reduces ImageNet training costs by 50%.

02

Efficient data from prior models maintains accuracy comparable to traditional methods.

03

The approach accelerates representation learning without sacrificing performance.

Abstract

Data, the seminal opportunity and challenge in modern machine learning, currently constrains the scalability of representation learning and impedes the pace of model evolution. In this work, we investigate the efficiency properties of data from both optimization and generalization perspectives. Our theoretical and empirical analysis reveals an unexpected finding: for a given task, utilizing a publicly available, task- and architecture-agnostic model (referred to as the `prior model' in this paper) can effectively produce efficient data. Building on this insight, we propose the Representation Learning Accelerator (\algopt), which promotes the formation and utilization of efficient data, thereby accelerating representation learning. Utilizing a ResNet-18 pre-trained on CIFAR-10 as a prior model to inform ResNet-50 training on ImageNet-1K reduces computational costs by 50% while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lins-lab/rela
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Algorithms and Data Compression

MethodsBootstrap Your Own Latent · Contrastive Language-Image Pre-training