Efficient Context Scaling with LongCat ZigZag Attention

Chen Zhang; Yang Bai; Jiahuan Li; Anchun Gui; Keheng Wang; Feifan Liu; Guanyu Wu; Yuwei Jiang; Defei Bu; Li Wei; Haihang Jing; Hongyin Tang; Xin Chen; Xiangzhou Huang; Fengcun Li; Rongxiang Weng; Yulei Qian; Yifan Lu; Yerui Sun; Jingang Wang; Yuchen Xie; Xunliang Cai

arXiv:2512.23966·cs.CL·January 7, 2026

Efficient Context Scaling with LongCat ZigZag Attention

Chen Zhang, Yang Bai, Jiahuan Li, Anchun Gui, Keheng Wang, Feifan Liu, Guanyu Wu, Yuwei Jiang, Defei Bu, Li Wei, Haihang Jing, Hongyin Tang, Xin Chen, Xiangzhou Huang, Fengcun Li, Rongxiang Weng, Yulei Qian, Yifan Lu, Yerui Sun, Jingang Wang, Yuchen Xie, Xunliang Cai

PDF

Open Access 1 Models

TL;DR

This paper presents LongCat ZigZag Attention (LoZA), a sparse attention method that enhances existing models to efficiently handle extremely long contexts up to 1 million tokens, improving speed and reasoning capabilities.

Contribution

The paper introduces LoZA, a novel sparse attention scheme that transforms full-attention models into efficient sparse models suitable for very long contexts.

Findings

01

LoZA achieves significant speed-ups in long-context scenarios.

02

LoZA enables models to process up to 1 million tokens.

03

LoZA improves long-term reasoning and agentic capabilities.

Abstract

We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
meituan-longcat/LongCat-Flash-Thinking-ZigZag
model· 12 dl· ♡ 31
12 dl♡ 31

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning