Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Junru Lu, Jiarui Qin, Lingfeng Qiao, Yinghui Li, Xinyi Dai, Bo Ke, Jianfeng He, Ruizhi Qiao, Di Yin, Xing Sun, Yunsheng Wu, Yinsong Liu, Shuangyin Liu, Mingkong Tang, Haodong Lin, Jiayi Kuang, Fanxu Meng, Xiaojuan Tang, Yunjia Xi, Junjie Huang, Haotong Yang, Zhenyi Shen

TL;DR
Youtu-LLM is a lightweight 1.96B language model designed with a novel architecture and training curriculum to achieve strong reasoning, planning, and agentic abilities comparable to larger models.
Contribution
It introduces a compact Multi-Latent Attention architecture with long-context support and a multi-stage curriculum for training lightweight LLMs with agentic capabilities.
Findings
Sets new state-of-the-art for sub-2B LLMs.
Achieves competitive performance on general benchmarks.
Surpasses existing SOTA on agent-specific tasks.
Abstract
We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tencent/Youtu-VL-4B-Instruct-GGUFmodel· 712 dl· ♡ 60712 dl♡ 60
- 🤗tencent/Youtu-LLM-2B-Basemodel· 3.3k dl· ♡ 423.3k dl♡ 42
- 🤗tencent/Youtu-LLM-2Bmodel· 750 dl· ♡ 227750 dl♡ 227
- 🤗tencent/Youtu-LLM-2B-GGUFmodel· 291 dl· ♡ 25291 dl♡ 25
- 🤗tencent/Youtu-Parsingmodel· 122 dl· ♡ 38122 dl♡ 38
- 🤗tencent/Youtu-VL-4B-Instructmodel· 422 dl· ♡ 154422 dl♡ 154
- 🤗onnx-community/Youtu-LLM-2B-ONNXmodel· 6 dl· ♡ 26 dl♡ 2
- 🤗Mungert/Youtu-VL-4B-Instruct-GGUFmodel· 144 dl· ♡ 1144 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
