52B to 1T: Lessons Learned via Tele-FLM Series
Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu,, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li,, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang,, Xuelong Li, Tiejun Huang

TL;DR
This paper discusses scaling large language models from 52 billion to 1 trillion parameters, highlighting best practices, fine-tuning strategies, and open-sourcing a new 1T model to support further research.
Contribution
It introduces methods for effective fine-tuning and scaling of LLMs from 52B to 1T parameters, including open-sourcing the Tele-FLM-1T model.
Findings
Supervised Fine-tuning supports 'less is more' approach.
Best practices for progressive model scaling.
Open-sourcing of Tele-FLM-1T model.
Abstract
Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Machine Learning and Data Classification
MethodsShrink and Fine-Tune
