Random Permutation Codes: Lossless Source Coding of Non-Sequential Data

Daniel Severo

arXiv:2411.14879·cs.IT·November 25, 2024

Random Permutation Codes: Lossless Source Coding of Non-Sequential Data

Daniel Severo

PDF

Open Access

TL;DR

This thesis introduces a formal framework for lossless compression of non-sequential data types, utilizing Random Permutation Codes to achieve optimal rates by removing order-related redundancy.

Contribution

It formalizes non-sequential data as Combinatorial Random Variables and develops Random Permutation Codes for their efficient lossless compression.

Findings

01

Achieves full characterization of CRV rates based on data and equivalence relations.

02

Develops specialized RPCs for multisets, graphs, and clusterings.

03

Provides new algorithms for compressing databases, social networks, and web data.

Abstract

This thesis deals with the problem of communicating and storing non-sequential data. We investigate this problem through the lens of lossless source coding, also sometimes referred to as lossless compression, from both an algorithmic and information-theoretic perspective. Lossless compression algorithms typically preserve the ordering in which data points are compressed. However, there are data types where order is not meaningful, such as collections of files, rows in a database, nodes in a graph, and, notably, datasets in machine learning applications. Compressing with traditional algorithms is possible if we pick an order for the elements and communicate the corresponding ordered sequence. However, unless the order information is somehow removed during the encoding process, this procedure will be sub-optimal, because the order contains information and therefore more bits are used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCooperative Communication and Network Coding · Wireless Communication Security Techniques · DNA and Biological Computing