Floating-Point Data Transformation for Lossless Compression

Samirasadat Jamalidinan; Kazem Cheshmi

arXiv:2506.18062·cs.DB·August 11, 2025

Floating-Point Data Transformation for Lossless Compression

Samirasadat Jamalidinan, Kazem Cheshmi

PDF

TL;DR

This paper introduces Typed Data Transformation (TDT), a novel method for lossless compression of floating-point data that groups related bytes to exploit inherent patterns, achieving better compression ratios and throughput.

Contribution

The paper presents TDT, a new data transformation technique that leverages byte correlations in floating-point data for improved lossless compression performance.

Findings

01

TDT improves compression ratio by 1.16× over zstd.

02

TDT enhances compression and decompression throughput by 1.18–3.79×.

03

TDT is effective across CPU and GPU datasets.

Abstract

Floating-point data is widely used across various domains. Depending on the required precision, each floating-point value can occupy several bytes. Lossless storage of this information is crucial due to its critical accuracy, as seen in applications such as medical imaging and language model weights. In these cases, data size is often significant, making lossless compression essential. Previous approaches either treat this data as raw byte streams for compression or fail to leverage all patterns within the dataset. However, because multiple bytes represent a single value and due to inherent patterns in floating-point representations, some of these bytes are correlated. To leverage this property, we propose a novel data transformation method called Typed Data Transformation (TDT) that groups related bytes together to improve compression. We implemented and tested our approach on various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.