Finding Synchronization Codes to Boost Compression by Substring Enumeration
Dany Vohl, Claude-Guy Quimper, Danny Dub\'e

TL;DR
This paper introduces two constraint models to find the shortest synchronization codes that enhance the performance of Compression by Substring Enumeration (CSE) by adding minimal synchronization bits.
Contribution
It presents novel constraint models for computing minimal synchronization codes for blocks up to 64 bits, improving CSE compression effectiveness.
Findings
Successfully computed shortest synchronization codes for blocks up to 64 bits.
Demonstrated that inserting synchronization codes improves CSE compression performance.
Provided a new approach to optimize synchronization code length for bit-oriented compression schemes.
Abstract
Synchronization codes are frequently used in numerical data transmission and storage. Compression by Substring Enumeration (CSE) is a new lossless compression scheme that has turned into a new and unusual application for synchronization codes. CSE is an inherently bit-oriented technique. However, since the usual benchmark files are all byte-oriented, CSE incurred a penalty due to a problem called phase unawareness. Subsequent work showed that inserting a synchronization code inside the data before compressing it improves the compression performance. In this paper, we present two constraint models that compute the shortest synchronization codes, i.e. those that add the fewest synchronization bits to the original data. We find synchronization codes for blocks of up to 64 bits.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cellular Automata and Applications · Advanced Data Storage Technologies
