Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Arjun S. Nair

arXiv:2601.02609·cs.LG·January 7, 2026

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Arjun S. Nair

PDF

Open Access

TL;DR

Chronicals is a high-performance fine-tuning framework for large language models that achieves a 3.51x speedup over previous methods by combining multiple optimizations, enabling faster and more memory-efficient training.

Contribution

The paper introduces Chronicals, a novel framework that integrates four key optimizations to significantly accelerate LLM fine-tuning and reduce memory usage, with comprehensive theoretical foundations.

Findings

01

Achieves 3.51x speedup over Unsloth in fine-tuning

02

Reduces memory usage for large models by optimizing data flow

03

Provides mathematical proofs and open-source implementation

Abstract

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Network Packet Processing and Optimization · Optimization and Packing Problems