Extracting Document Relations from Search Corpus by Marginalizing over User Queries

Yuki Iwamoto; Kaoru Tsunoda; Ken Kaneiwa

arXiv:2507.10726·cs.IR·July 16, 2025

Extracting Document Relations from Search Corpus by Marginalizing over User Queries

Yuki Iwamoto, Kaoru Tsunoda, Ken Kaneiwa

PDF

Open Access

TL;DR

This paper introduces EDR-MQ, a novel framework that discovers document relationships by analyzing co-occurrence patterns across diverse user queries, eliminating the need for manual annotations or predefined taxonomies.

Contribution

The paper presents a new query marginalization approach and MC-RAG method to estimate document relationships from search results without labeled data.

Findings

01

Successfully identifies topical clusters and evidence chains.

02

Reveals cross-domain connections not found by traditional methods.

03

Adapts to different user perspectives and information needs.

Abstract

Understanding relationships between documents in large-scale corpora is essential for knowledge discovery and information organization. However, existing approaches rely heavily on manual annotation or predefined relationship taxonomies. We propose EDR-MQ (Extracting Document Relations by Marginalizing over User Queries), a novel framework that discovers document relationships through query marginalization. EDR-MQ is based on the insight that strongly related documents often co-occur in results across diverse user queries, enabling us to estimate joint probabilities between document pairs by marginalizing over a collection of queries. To enable this query marginalization approach, we develop Multiply Conditioned Retrieval-Augmented Generation (MC-RAG), which employs conditional retrieval where subsequent document retrievals depend on previously retrieved content. By observing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Semantic Web and Ontologies · Advanced Text Analysis Techniques