InternLM2 Technical Report
Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen,, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan,, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo,, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang

TL;DR
InternLM2 is a new open-source large language model that surpasses previous models in multiple benchmarks, long-context understanding, and subjective evaluations through innovative training and alignment techniques.
Contribution
It introduces InternLM2 with advanced pre-training, long-context modeling, and a novel COOL RLHF alignment method, providing a comprehensive open-source LLM with detailed training insights.
Findings
Outperforms predecessors on 30 benchmarks and 6 evaluation dimensions.
Effectively models long contexts up to 32k tokens.
Demonstrates strong performance on the Needle-in-a-Haystack test.
Abstract
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bespokelabs/Bespoke-MiniCheck-7Bmodel· 16k dl· ♡ 8016k dl♡ 80
- 🤗internlm/internlm3-8b-instructmodel· 40k dl· ♡ 23040k dl♡ 230
- 🤗internlm/internlm2-chat-7bmodel· 47k dl· ♡ 8347k dl♡ 83
- 🤗internlm/internlm2-chat-20bmodel· 19k dl· ♡ 8819k dl♡ 88
- 🤗internlm/internlm2-chat-7b-sftmodel· 3.8k dl· ♡ 63.8k dl♡ 6
- 🤗internlm/internlm2-chat-20b-sftmodel· 71 dl· ♡ 1271 dl♡ 12
- 🤗internlm/internlm2-base-7bmodel· 20k dl· ♡ 1020k dl♡ 10
- 🤗internlm/internlm2-7bmodel· 25k dl· ♡ 4325k dl♡ 43
- 🤗internlm/internlm2-base-20bmodel· 18k dl· ♡ 818k dl♡ 8
- 🤗internlm/internlm2-20bmodel· 22k dl· ♡ 5922k dl♡ 59
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout
