KAT-V1: Kwai-AutoThink Technical Report

Zizheng Zhan; Ken Deng; Huaixi Tang; Wen Xiang; Kun Wu; Weihao Li; Wenqiang Zhu; Jingxuan Xu; Lecheng Huang; Zongxian Feng; Shaojie Wang; Shangpeng Yan; Xuxing Chen; Jiaheng Liu; Zhongyuan Peng; Zuchen Gao; Haoyang Huang; Xiaojiang Zhang; Jinghui Wang; Zheng Lin; Mengtong Li; Huiming Wang; Ziqi Zhan; Yanan Wu; Yuanxing Zhang; Jian Yang; Guang Chen; Haotian Zhang; Bin Chen; Bing Yu

arXiv:2507.08297·cs.CL·July 22, 2025

KAT-V1: Kwai-AutoThink Technical Report

Zizheng Zhan, Ken Deng, Huaixi Tang, Wen Xiang, Kun Wu, Weihao Li, Wenqiang Zhu, Jingxuan Xu, Lecheng Huang, Zongxian Feng, Shaojie Wang, Shangpeng Yan, Xuxing Chen, Jiaheng Liu, Zhongyuan Peng, Zuchen Gao, Haoyang Huang, Xiaojiang Zhang, Jinghui Wang, Zheng Lin, Mengtong Li

PDF

4 Models

TL;DR

KAT-V1 is an open-source 40B large language model designed to improve reasoning tasks by dynamically switching modes, utilizing novel training strategies, and demonstrating superior performance and efficiency across benchmarks and real-world applications.

Contribution

The paper introduces the AutoThink paradigm, dual-regime dataset construction, Multi-Token Prediction knowledge distillation, and Step-SRPO reinforcement learning, advancing reasoning efficiency and effectiveness in large language models.

Findings

01

KAT-V1 outperforms state-of-the-art models on reasoning benchmarks.

02

KAT reduces token usage while maintaining high accuracy.

03

Deployment in Kwaipilot enhances real-world coding workflows.

Abstract

We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks, where an automatic thinking training paradigm is proposed to dynamically switch between reasoning and non-reasoning modes based on task complexity. Specifically, first, we construct the dual-regime dataset based on a novel tagging pipeline and a multi-agent synthesis strategy, and then we apply Multi-Token Prediction (MTP)-enhanced knowledge distillation, enabling efficient and fine-grained reasoning transfer with minimal pretraining cost. Besides, we implement a cold-start initialization strategy that introduces mode-selection priors using majority-vote signals and intent-aware prompting. Finally, we propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.