PQuantML: A Tool for End-to-End Hardware-aware Model Compression

Roope Niemi; Anastasiia Petrovych; Arghya Ranjan Das; Enrico Lupi; Chang Sun; Dimitrios Danopoulos; Marlon Joshua Helbing; Mia Liu; Sebastian Dittmeier; Michael Kagan; Vladimir Loncar; Maurizio Pierini

arXiv:2603.26595·cs.LG·March 30, 2026

PQuantML: A Tool for End-to-End Hardware-aware Model Compression

Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

PDF

TL;DR

PQuantML is an open-source, hardware-aware model compression library that simplifies deploying neural networks with pruning and quantization for latency-critical environments.

Contribution

It introduces a unified interface for end-to-end hardware-aware model compression, supporting multiple pruning methods and fixed-point quantization.

Findings

01

Achieves significant parameter and bit-width reduction while maintaining accuracy.

02

Evaluated on jet tagging, a real-time LHC data processing task.

03

Outperforms existing tools like QKeras and HGQ in compression effectiveness.

Abstract

PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multiple pruning methods with different granularities, as well as fixed-point quantization with support for High-Granularity Quantization. We evaluate PQuantML on representative tasks such as the jet substructure classification, so-called jet tagging, an on-edge problem related to real-time LHC data processing. Using various pruning methods with fixed-point quantization, PQuantML achieves substantial parameter and bit-width reductions while maintaining accuracy. The resulting compression is further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.