HiCI: Hierarchical Construction-Integration for Long-Context Attention
Xiangyu Zeng, Qi Xu, Yunke Wang, Chang Xu

TL;DR
HiCI introduces a hierarchical attention module inspired by cognitive theories, significantly improving long-context language modeling efficiency and performance by constructing and integrating segment-level representations.
Contribution
The paper presents HiCI, a novel hierarchical attention mechanism that enhances long-context modeling with minimal additional parameters, outperforming existing models on multiple benchmarks.
Findings
Extended context from 4K to 100K tokens with minimal parameter increase.
Achieved performance comparable to proprietary models on topic retrieval.
Surpassed GPT-3.5-Turbo-16K on code comprehension tasks.
Abstract
Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ZengXiangyu/Qwen3-8b-HiCI-48k-500stepsmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗ZengXiangyu/Qwen3-8b-HiCI-48k-1000stepsmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗ZengXiangyu/Llama-2-7b-HiCI-16kmodel· 15 dl· ♡ 115 dl♡ 1
- 🤗ZengXiangyu/Llama-2-7b-HiCI-16k-SFTmodel· 21 dl21 dl
- 🤗ZengXiangyu/Llama-3-8b-HiCI-16kmodel· 13 dl13 dl
- 🤗ZengXiangyu/Llama-3-8b-HiCI-32kmodel· 14 dl14 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
