A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Ce Zhou (1), Qian Li (2), Chen Li (2), Jun Yu (3), Yixin Liu (3),, Guangjing Wang (1), Kai Zhang (3), Cheng Ji (2), Qiben Yan (1), Lifang He, (3), Hao Peng (2), Jianxin Li (2), Jia Wu (4), Ziwei Liu (5), Pengtao Xie, (6), Caiming Xiong (7), Jian Pei (8), Philip S. Yu (9)

TL;DR
This comprehensive survey reviews recent advancements, challenges, and future directions of Pretrained Foundation Models across multiple data modalities, highlighting their impact on AI progress and open research problems.
Contribution
It provides an updated, detailed overview of PFM methods, applications, and challenges across text, image, and graph data modalities, including model efficiency and security issues.
Findings
PFMs have significantly advanced AI across multiple modalities.
Research highlights the importance of model efficiency and security.
Future directions include improving scalability, reasoning, and cross-domain learning.
Abstract
Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · FinTech, Crowdfunding, Digital Finance · Artificial Intelligence in Law
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Warmup With Cosine Annealing
