Compression Performance Analysis of Different File Formats
Han Yang, Guangjun Qin, Yongqing Hu

TL;DR
This study compares the compression performance of 22 different file formats using the Zlib algorithm, highlighting how some formats compress well while others do not, to optimize data storage and transmission efficiency.
Contribution
It provides a comprehensive analysis of compression gains across various file formats, guiding optimal format selection for data reduction.
Findings
Some file formats achieve significant size reduction and faster compression.
Certain formats show minimal compression gains and longer compression times.
The study offers recommendations for selecting file formats based on compression performance.
Abstract
In data storage and transmission, file compression is a common technique for reducing the volume of data, reducing data storage space and transmission time and bandwidth. However, there are significant differences in the compression performance of different types of file formats, and the benefits vary. In this paper, 22 file formats with approximately 178GB of data were collected and the Zlib algorithm was used for compression experiments to compare performance in order to investigate the compression gains of different file types. The experimental results show that some file types are poorly compressed, with almost constant file size and long compression time, resulting in lower gains; some other file types are significantly reduced in file size and compression time after compression, which can effectively reduce the data volume. Based on the above experimental results, this paper will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems
