Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models

Shashank Landge; Abhishek Patil; Tejas kamble; Bhushan Buddhivant; Priyanka Joshi

arXiv:2512.01343·cs.LG·December 3, 2025

Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models

Shashank Landge, Abhishek Patil, Tejas kamble, Bhushan Buddhivant, Priyanka Joshi

PDF

Open Access

TL;DR

This paper introduces a data-free, SVD-based method for mixed-precision quantization of large language models, preserving important weights by their intrinsic structural importance, leading to improved performance on NLP benchmarks.

Contribution

The paper proposes a novel, data-free weight selection heuristic based on SVD that identifies intrinsically important weights for quantization, outperforming existing methods without requiring calibration data.

Findings

01

SVD-based weight importance correlates with functional importance in models.

02

The method outperforms AWQ and SpQR on RTE benchmark.

03

Structural importance can serve as a robust proxy for weight saliency.

Abstract

As Large Language Models (LLMs) continue to scale in parameter count, deploying them on commodity hardware has become increasingly challenging. Post-Training Quantization (PTQ) addresses this by reducing the precision of model weights, typically to 4-bit or lower. However, uniform quantization often leads to significant performance degradation due to the presence of ``outlier features'' -- weights that, while few in number, are critical for maintaining model accuracy. Current state-of-the-art methods such as AWQ (Activation-aware Weight Quantization) and SpQR (Sparse Quantization Representations) rely on calibration data to identify these salient weights via activation magnitudes or Hessian sensitivity. In scenarios where data privacy is paramount or calibration data is unavailable, these methods are inapplicable. In this work, we propose a data-free, structure-aware hypothesis: that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Explainable Artificial Intelligence (XAI) · Big Data and Digital Economy