Kimi K2.5: Visual Agentic Intelligence
Kimi Team: Tongtong Bai, Yifan Bai, Yiping Bao, S.H. Cai, Yuan Cao, Y. Charles, H.S. Che, Cheng Chen, Guanduo Chen, Huarong Chen, Jia Chen, Jiahao Chen, Jianlong Chen, Jun Chen, Kefan Chen, Liang Chen, Ruijue Chen, Xinhao Chen, Yanru Chen, Yanxu Chen, Yicun Chen, Yimin Chen

TL;DR
Kimi K2.5 is an open-source multimodal agentic model that integrates text and vision, featuring a novel agent orchestration framework, achieving state-of-the-art results and reduced latency in complex tasks.
Contribution
The paper introduces Kimi K2.5, a multimodal model with joint text-vision training and a new agent swarm framework for dynamic task decomposition and execution.
Findings
Achieves state-of-the-art performance in coding, vision, reasoning, and agentic tasks.
Reduces latency by up to 4.5 times compared to single-agent systems.
Provides open-source checkpoint for future research.
Abstract
We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗moonshotai/Kimi-K2.5model· 4.4M dl· ♡ 23964.4M dl♡ 2396
- 🤗infatree/affine-guard-5FW4FofNoqZmKS4nkbznL164ajvjVxuVB6z8LjLuve7FhmjKmodel· 44 dl44 dl
- 🤗Eisenbergg/Affine-GregNewmodel· 2 dl2 dl
- 🤗Nailioq/Openmodel· 1 dl1 dl
- 🤗ltnpro/Kimi-K2.5model· 5 dl5 dl
- 🤗LantzShaw/Kimi-K2.5model· 7 dl7 dl
- 🤗Shamshir77/Kimi-K2.5model· 16 dl16 dl
- 🤗paol4/Kimi-K2.5model· 2 dl2 dl
- 🤗Rakesh1l/Testermodel
- 🤗Janchan123/Kimi-K2.5model· 9 dl9 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling
