GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

GLM-V Team: Wenyi Hong; Wenmeng Yu; Xiaotao Gu; Guo Wang; Guobing Gan; Haomiao Tang; Jiale Cheng; Ji Qi; Junhui Ji; Lihang Pan; Shuaiqi Duan; Weihan Wang; Yan Wang; Yean Cheng; Zehai He; Zhe Su; Zhen Yang; Ziyang Pan; Aohan Zeng; Baoxu Wang; Bin Chen; Boyan Shi; Changyu Pang; Chenhui Zhang; Da Yin; Fan Yang; Guoqing Chen; Haochen Li; Jiale Zhu; Jiali Chen; Jiaxing Xu; Jiazheng Xu; Jing Chen; Jinghao Lin; Jinhao Chen; Jinjiang Wang; Junjie Chen; Leqi Lei; Letian Gong; Leyi Pan; Mingdao Liu; Mingde Xu; Mingzhi Zhang; Qinkai Zheng; Ruiliang Lyu; Shangqin Tu; Sheng Yang; Shengbiao Meng; Shi Zhong; Shiyu Huang; Shuyuan Zhao; Siyan Xue; Tianshu Zhang; Tianwei Luo; Tianxiang Hao; Tianyu Tong; Wei Jia; Wenkai Li; Xiao Liu; Xiaohan Zhang; Xin Lyu; Xinyu Zhang; Xinyue Fan; Xuancheng Huang; Yadong Xue; Yanfeng Wang; Yanling Wang; Yanzi Wang; Yifan An; Yifan Du; Yiheng Huang; Yilin Niu; Yiming Shi; Yu Wang; Yuan Wang; Yuanchang Yue; Yuchen Li; Yusen Liu; Yutao Zhang; Yuting Wang; Yuxuan Zhang; Zhao Xue; Zhengxiao Du; Zhenyu Hou; Zihan Wang; Peng Zhang; Debing Liu; Bin Xu; Juanzi Li; Minlie Huang; Yuxiao Dong; Jie Tang

arXiv:2507.01006·cs.CV·January 5, 2026

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

GLM-V Team: Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang

PDF

1 Repo 10 Models

TL;DR

This paper introduces a family of vision-language models, GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, that leverage reinforcement learning and large-scale pre-training to achieve state-of-the-art multimodal reasoning across diverse tasks.

Contribution

The paper presents a new training framework and models that significantly improve multimodal reasoning capabilities and performance on numerous benchmarks.

Findings

01

GLM-4.5V achieves state-of-the-art results on 42 benchmarks.

02

GLM-4.1V-9B-Thinking outperforms larger models on many tasks.

03

Open-source models with native tool use and extended context window.

Abstract

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thudm/glm-4.1v-thinking
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.