Community-Aware Vertex Ordering for Reference-Based Graph Compression: A Cross-Encoder Empirical Study
Jimmy Dubuisson

TL;DR
This study introduces a community-aware vertex ordering method for reference-based graph compression, demonstrating significant size reductions and improved encoding efficiency across various datasets.
Contribution
It proposes a novel Leiden+LLP vertex ordering technique and evaluates its impact on compression, along with three new reference-based encoders for optimal vertex encoding.
Findings
Reordering reduces bits per edge by 0.3 to 5.4 on weakly ordered graphs.
Community-aware ordering benefits even URL-ordered web crawls.
New encoders outperform BVGraph by 2-9% in compression ratio.
Abstract
Reference-based graph compression encodes each vertex's neighbor list relative to a recent vertex, exploiting locality to compress large directed graphs. The dominant tool, WebGraph's BVGraph, fixes a single encoding pipeline and relies on a separately chosen vertex ordering -- typically URL-lexicographic or Layered Label Propagation (LLP). The interaction between ordering and encoder is rarely measured. We propose a two-stage Leiden+LLP vertex ordering -- global LLP to seed labels, Leiden community detection, then per-cluster LLP on each induced subgraph -- and study how it interacts with reference-based compression. On graphs with poor initial vertex order, reordering saves 0.3 to 5.4 bits per edge on every dataset and encoder we measured. The size of that gain is largely insensitive to the encoder: on four of five weakly ordered datasets, four independently parameterised encoders…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
