Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity
Tristan Simas

TL;DR
This paper establishes fundamental limits and laws governing semantic identity compression in neural and symbolic systems, emphasizing the importance of collision-fiber geometry for exact identity recovery.
Contribution
It introduces finite laws and rate-distortion principles based on collision-fiber geometry, linking structural ambiguity to symbolic identity mechanisms.
Findings
Derived fixed-length converse law for identity recovery
Established exact fiberwise rate-distortion law for finite sources
Connected collision-fiber geometry to query complexity and system structure
Abstract
Symbolic systems operate over precise identities: variables denote specific objects, pointers target precise memory locations, and database keys refer to singular records. Neural embeddings generalize by compressing away semantic detail, but this compression creates collision ambiguity: multiple distinct entities can share the same representation value. Exact identity recovery requires additional information precisely when representation fibers have size greater than one. The residual cost is controlled by a single combinatorial object: the collision-fiber geometry of the representation map . Let be the largest collision fiber. The finite laws include a tight fixed-length converse , an exact finite-block scaling law, a pointwise adaptive budget , and an exact fiberwise rate-distortion law for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
