OneComp: One-Line Revolution for Generative AI Model Compression

Yuma Ichikawa; Keiji Kimura; Akihiro Yoshida; Yudai Fujimoto; Hiroki Tokura; Yamato Arai; Yoshiyuki Ishii; Yusei Kawakami; Genki Shikada; Achille Jacquemond; Yoshihiko Fujisawa; Katsuki Fujisawa; Takumi Honda; Akira Sakai

arXiv:2603.28845·cs.LG·April 1, 2026

OneComp: One-Line Revolution for Generative AI Model Compression

Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida, Yudai Fujimoto, Hiroki Tokura, Yamato Arai, Yoshiyuki Ishii, Yusei Kawakami, Genki Shikada, Achille Jacquemond, Yoshihiko Fujisawa, Katsuki Fujisawa, Takumi Honda, Akira Sakai

PDF

1 Repo

TL;DR

OneComp is an open-source framework that automates and optimizes model compression for generative AI, making deployment more efficient across different hardware.

Contribution

It introduces a resource-adaptive, reproducible pipeline that automates mixed-precision quantization and refinement stages for model compression.

Findings

01

Automates model inspection and quantization planning based on hardware.

02

Progressively refines model compression while maintaining performance.

03

Bridges research and production with an extensible, hardware-aware pipeline.

Abstract

Deploying foundation models is increasingly constrained by memory footprint, latency, and hardware costs. Post-training compression can mitigate these bottlenecks by reducing the precision of model parameters without significantly degrading performance; however, its practical implementation remains challenging as practitioners navigate a fragmented landscape of quantization algorithms, precision budgets, data-driven calibration strategies, and hardware-dependent execution regimes. We present OneComp, an open-source compression framework that transforms this expert workflow into a reproducible, resource-adaptive pipeline. Given a model identifier and available hardware, OneComp automatically inspects the model, plans mixed-precision assignments, and executes progressive quantization stages, ranging from layer-wise compression to block-wise refinement and global refinement. A key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.