A Comprehensive Survey on Pretrained Foundation Models: A History from   BERT to ChatGPT

Ce Zhou (1); Qian Li (2); Chen Li (2); Jun Yu (3); Yixin Liu (3),; Guangjing Wang (1); Kai Zhang (3); Cheng Ji (2); Qiben Yan (1); Lifang He; (3); Hao Peng (2); Jianxin Li (2); Jia Wu (4); Ziwei Liu (5); Pengtao Xie; (6); Caiming Xiong (7); Jian Pei (8); Philip S. Yu (9); Lichao Sun (3) ((1); Michigan State University; (2) Beihang University; (3) Lehigh University; (4); Macquarie University; (5) Nanyang Technological University; (6) University of; California San Diego; (7) Salesforce AI Research; (8) Duke University; (9); University of Illinois at Chicago)

arXiv:2302.09419·cs.AI·May 2, 2023·152 cites

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou (1), Qian Li (2), Chen Li (2), Jun Yu (3), Yixin Liu (3),, Guangjing Wang (1), Kai Zhang (3), Cheng Ji (2), Qiben Yan (1), Lifang He, (3), Hao Peng (2), Jianxin Li (2), Jia Wu (4), Ziwei Liu (5), Pengtao Xie, (6), Caiming Xiong (7), Jian Pei (8), Philip S. Yu (9)

PDF

Open Access

TL;DR

This comprehensive survey reviews recent advancements, challenges, and future directions of Pretrained Foundation Models across multiple data modalities, highlighting their impact on AI progress and open research problems.

Contribution

It provides an updated, detailed overview of PFM methods, applications, and challenges across text, image, and graph data modalities, including model efficiency and security issues.

Findings

01

PFMs have significantly advanced AI across multiple modalities.

02

Research highlights the importance of model efficiency and security.

03

Future directions include improving scalability, reasoning, and cross-domain learning.

Abstract

Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · FinTech, Crowdfunding, Digital Finance · Artificial Intelligence in Law

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Warmup With Cosine Annealing