Contextual Tokenization for Graph Inverted Indices

Pritish Chakraborty; Indradyumna Roy; Soumen Chakrabarti; Abir De

arXiv:2510.22479·cs.LG·November 4, 2025

Contextual Tokenization for Graph Inverted Indices

Pritish Chakraborty, Indradyumna Roy, Soumen Chakrabarti, Abir De

PDF

1 Video

TL;DR

CORGII introduces a novel graph indexing framework that converts dense graph representations into sparse binary codes, enabling efficient retrieval using inverted indices and improving accuracy-efficiency trade-offs.

Contribution

It is the first to index dense graph representations with discrete tokens for inverted lists, incorporating trainable impact weights and token expansion for enhanced retrieval performance.

Findings

01

CORGII outperforms baselines in accuracy and efficiency trade-offs.

02

The framework supports soft set containment scoring.

03

Extensive experiments validate its effectiveness.

Abstract

Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CORGII (Contextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indices, while supporting soft (vector) set containment scores. Pushing this paradigm further, we replace the classical, fixed impact weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contextual Tokenization for Graph Inverted Indices· slideslive