ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference

Ketan Thakkar; Maitreyi Chatterjee; Ramasubramanian Balasubramanian; Achyuthan Jootoo; Rajendra Ugrani

arXiv:2601.21109·cs.CL·January 30, 2026

ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference

Ketan Thakkar, Maitreyi Chatterjee, Ramasubramanian Balasubramanian, Achyuthan Jootoo, Rajendra Ugrani

PDF

Open Access

TL;DR

ChunkWise LoRA introduces an adaptive sequence partitioning method that dynamically assigns low-rank adaptation configurations to different sequence chunks, significantly reducing latency and memory usage in LLM inference while maintaining performance.

Contribution

It presents a novel runtime adaptive chunking and rank assignment mechanism for LoRA, enhancing efficiency without sacrificing output quality.

Findings

01

Achieves up to 34% lower latency

02

Reduces memory usage by 38%

03

Maintains or improves task performance metrics

Abstract

Recent advances in low-rank adaptation (LoRA) have enabled efficient fine-tuning of large language models (LLMs) with minimal additional parameters. However, existing LoRA methods apply static rank configurations uniformly across all input tokens, ignoring variation in token complexity and computational requirements. In this work, we propose ChunkWise LoRA, a dynamic and adaptive approach that partitions sequences into variable-length chunks based on token complexity and assigns each chunk a tailored low-rank configuration. Our system introduces a runtime scheduler that estimates token difficulty, performs adaptive chunking, and selects per-chunk LoRA rank and scaling using a rank-ladder mechanism. To preserve output consistency, we further introduce a boundary-safe composition module and integrate policy-driven KV-cache strategies. Experiments on benchmark datasets such as Wikitext-103…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis