Technical Report of TeleChat2, TeleChat2.5 and T1
Zihan Wang, Xinzhang Liu, Yitong Yao, Chao Wang, Yu Zhao, Zhihao Yang, Wenmin Deng, Kaipeng Jia, Jiaxin Peng, Yuyao Huang, Sishi Xiong, Zhuo Jiang, Kaidong Yu, Xiaohui Hu, Fubei Yao, Ruiyu Fang, Zhuoru Jiang, Ruiting Song, Qiyi Xie, Rui Xue, Xuewei He, Yanlei Xue, Zhu Yuan

TL;DR
This paper introduces TeleChat2, TeleChat2.5, and T1, a series of advanced language models with improved training strategies, larger datasets, and specialized capabilities for reasoning and speed, outperforming some proprietary models.
Contribution
The paper presents a new series of TeleChat models with enhanced training methods, domain-specific pretraining, and reinforcement learning, achieving superior performance in reasoning and coding tasks.
Findings
T1-115B outperforms GPT-4o and proprietary models.
TeleChat2.5 offers rapid inference for real-time applications.
Models are publicly released for research and development.
Abstract
We introduce the latest series of TeleChat models: \textbf{TeleChat2}, \textbf{TeleChat2.5}, and \textbf{T1}, offering a significant upgrade over their predecessor, TeleChat. Despite minimal changes to the model architecture, the new series achieves substantial performance gains through enhanced training strategies in both pre-training and post-training stages. The series begins with \textbf{TeleChat2}, which undergoes pretraining on 10 trillion high-quality and diverse tokens. This is followed by Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to further enhance its capabilities. \textbf{TeleChat2.5} and \textbf{T1} expand the pipeline by incorporating a continual pretraining phase with domain-specific datasets, combined with reinforcement learning (RL) to improve performance in code generation and mathematical reasoning tasks. The \textbf{T1} variant is designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Tele-AI/TeleChat2-115Bmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗Tele-AI/TeleChat2-3Bmodel· 12k dl· ♡ 212k dl♡ 2
- 🤗Tele-AI/TeleChat2-7B-32Kmodel· 31 dl· ♡ 231 dl♡ 2
- 🤗Tele-AI/T1-35Bmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗Tele-AI/T1-115Bmodel· 82 dl· ♡ 282 dl♡ 2
- 🤗Tele-AI/TeleChat2.5-115Bmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗Tele-AI/TeleChat2.5-35Bmodel· 2 dl· ♡ 22 dl♡ 2
- 🤗Tele-AI/TeleChat3-36B-Thinkingmodel· 28 dl· ♡ 1728 dl♡ 17
- 🤗Tele-AI/TeleChat3-Coder-36B-Thinkingmodel· 27 dl· ♡ 327 dl♡ 3
- 🤗FlagRelease/TeleChat3-36B-Thinking-mthreads-FlagOSmodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
