Tele-FLM Technical Report

Xiang Li; Yiqun Yao; Xin Jiang; Xuezhi Fang; Chao Wang; Xinzhang Liu,; Zihan Wang; Yu Zhao; Xin Wang; Yuyao Huang; Shuangyong Song; Yongxiang Li,; Zheng Zhang; Bo Zhao; Aixin Sun; Yequan Wang; Zhongjiang He; Zhongyuan Wang,; Xuelong Li; Tiejun Huang

arXiv:2404.16645·cs.CL·April 26, 2024

Tele-FLM Technical Report

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu,, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li,, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang,, Xuelong Li, Tiejun Huang

PDF

Open Access 4 Models

TL;DR

This paper introduces Tele-FLM, a 52-billion-parameter multilingual large language model with a stable, efficient pre-training method, demonstrating strong multilingual and English/Chinese capabilities while sharing detailed training insights.

Contribution

The paper presents Tele-FLM, a novel open-source multilingual LLM with a stable training paradigm and competitive performance, addressing the lack of scalable methodologies for models beyond 50B parameters.

Findings

01

Tele-FLM achieves superior multilingual modeling abilities.

02

It performs comparably to larger models like Llama2-70B and DeepSeek-67B.

03

The report shares core design and training practices for community benefit.

Abstract

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Authorship Attribution and Profiling · Natural Language Processing Techniques