Super-model ecosystem: A domain-adaptation perspective

Fengxiang He; Dacheng Tao

arXiv:2208.14092·cs.LG·August 31, 2022

Super-model ecosystem: A domain-adaptation perspective

Fengxiang He, Dacheng Tao

PDF

Open Access

TL;DR

This paper provides a theoretical foundation for the super-model paradigm in AI, modeling its training process as a two-stage diffusion and establishing bounds on generalization error related to domain discrepancy.

Contribution

It introduces a mathematical model of super-model training as a two-stage diffusion process and derives a PAC-Bayesian generalization bound for domain adaptation.

Findings

01

The training process can be modeled by Uhlenbeck-Ornstein diffusion converging to Maxwell-Boltzmann distributions.

02

The generalization error during fine-tuning is the dominant factor in domain adaptation.

03

Generalization is influenced by a new measure of domain discrepancy based on covariance matrices and local minima shift.

Abstract

This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsDiffusion