Coding 35GB of Data in 35 Pages of Numbers
Philon Nguyen

TL;DR
This paper explores a novel coding method for large data sets, applying information theory to the difference space of bijective mappings, demonstrated through a 35GB Wikipedia data example.
Contribution
It introduces a new coding approach based on Hamming radius for difference spaces, extending traditional information theoretical results to large permutation matrices.
Findings
Efficient coding of 35GB Wikipedia data demonstrated
Application of Hamming radius-based coding to large data sets
Extension of p-adic coding results to difference spaces
Abstract
Usual information theoretical results show a logarithmic coding factor of value spaces to digital binary spaces using p-adic numbering systems. The following paper discusses a less commonly used case. It applies the same results to the difference space of bijective mappings of n-dimensional spaces to the line. It discusses a method where the logarithmic coding factor is provided over the Hamming radius of the code. An example is provided using the 35GB data dump of the Wikipedia website. This technique was initially developed for the study and computation of large permutation matrices on small clusters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsadvanced mathematical theories · Cellular Automata and Applications · Advanced Data Storage Technologies
