InternLM2 Technical Report

Zheng Cai; Maosong Cao; Haojiong Chen; Kai Chen; Keyu Chen; Xin Chen,; Xun Chen; Zehui Chen; Zhi Chen; Pei Chu; Xiaoyi Dong; Haodong Duan; Qi Fan,; Zhaoye Fei; Yang Gao; Jiaye Ge; Chenya Gu; Yuzhe Gu; Tao Gui; Aijia Guo,; Qipeng Guo; Conghui He; Yingfan Hu; Ting Huang; Tao Jiang; Penglong Jiao,; Zhenjiang Jin; Zhikai Lei; Jiaxing Li; Jingwen Li; Linyang Li; Shuaibin Li,; Wei Li; Yining Li; Hongwei Liu; Jiangning Liu; Jiawei Hong; Kaiwen Liu,; Kuikun Liu; Xiaoran Liu; Chengqi Lv; Haijun Lv; Kai Lv; Li Ma; Runyuan Ma,; Zerun Ma; Wenchang Ning; Linke Ouyang; Jiantao Qiu; Yuan Qu; Fukai Shang,; Yunfan Shao; Demin Song; Zifan Song; Zhihao Sui; Peng Sun; Yu Sun; Huanze; Tang; Bin Wang; Guoteng Wang; Jiaqi Wang; Jiayu Wang; Rui Wang; Yudong Wang,; Ziyi Wang; Xingjian Wei; Qizhen Weng; Fan Wu; Yingtong Xiong; Chao Xu,; Ruiliang Xu; Hang Yan; Yirong Yan; Xiaogui Yang; Haochen Ye; Huaiyuan Ying,; Jia Yu; Jing Yu; Yuhang Zang; Chuyu Zhang; Li Zhang; Pan Zhang; Peng Zhang,; Ruijie Zhang; Shuo Zhang; Songyang Zhang; Wenjian Zhang; Wenwei Zhang,; Xingcheng Zhang; Xinyue Zhang; Hui Zhao; Qian Zhao; Xiaomeng Zhao; Fengzhe; Zhou; Zaida Zhou; Jingming Zhuo; Yicheng Zou; Xipeng Qiu; Yu Qiao; Dahua Lin

arXiv:2403.17297·cs.CL·March 27, 2024·27 cites

InternLM2 Technical Report

Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen,, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan,, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo,, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang

PDF

Open Access 3 Repos 10 Models

TL;DR

InternLM2 is a new open-source large language model that surpasses previous models in multiple benchmarks, long-context understanding, and subjective evaluations through innovative training and alignment techniques.

Contribution

It introduces InternLM2 with advanced pre-training, long-context modeling, and a novel COOL RLHF alignment method, providing a comprehensive open-source LLM with detailed training insights.

Findings

01

Outperforms predecessors on 30 benchmarks and 6 evaluation dimensions.

02

Effectively models long contexts up to 32k tokens.

03

Demonstrates strong performance on the Needle-in-a-Haystack test.

Abstract

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout