Data-Free Layer-Adaptive Merging via Fisher Information for Long-to-Short Reasoning LLMs

Tian Xia

arXiv:2603.21705·cs.LG·March 24, 2026

Data-Free Layer-Adaptive Merging via Fisher Information for Long-to-Short Reasoning LLMs

Tian Xia

PDF

Open Access

TL;DR

This paper introduces FIM-Merging, a theoretically justified, data-free method for layer-adaptive model merging in long-to-short reasoning LLMs, achieving state-of-the-art results without calibration data.

Contribution

It provides the first theoretical analysis linking Fisher Information to merging error and proposes a practical, data-free layer-adaptive merging method based on Fisher Information.

Findings

01

FIM-Merging achieves state-of-the-art performance on L2S benchmarks.

02

FIM-Merging reduces output length by over 90%.

03

Theoretical analysis explains the success of layer-adaptive merging methods.

Abstract

Model merging has emerged as a practical approach to combine capabilities of specialized large language models (LLMs) without additional training. In the Long-to-Short (L2S) scenario, merging a base model with a long-chain-of-thought reasoning model aims to preserve reasoning accuracy while reducing output length. Existing methods rely on Task Arithmetic and its variants, which implicitly assume that model outputs vary linearly with the merging coefficient -- an assumption we show is systematically violated in L2S settings. We provide the first theoretical justification for layer-adaptive merging: we prove that merging error is bounded by a term proportional to the per-layer Hessian norm (Proposition~1), and establish that the Fisher Information Matrix (FIM) is a principled, computable proxy for this bound via the Fisher-Hessian equivalence at local optima. Building on this theory, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications