Query-driven Frequent Co-occurring Term Extraction over Relational Data using MapReduce
Jianxin Li, Chengfei Liu, Liang Yao, Jeffrey Xu Yu, Rui Zhou

TL;DR
This paper presents a MapReduce-based method for efficiently extracting the most frequent co-occurring terms in query results, aiding query expansion and refinement in large-scale relational data.
Contribution
It introduces a novel parallel approach that computes frequent co-occurring terms without precomputing query results, ensuring load balancing and scalability.
Findings
Efficient and scalable FCT extraction demonstrated on TPC-H datasets.
Two-MapReduce-job framework effectively balances load and computation.
Method outperforms traditional single-platform approaches in large datasets.
Abstract
In this paper we study how to efficiently compute \textit{frequent co-occurring terms} (FCT) in the results of a keyword query in parallel using the popular MapReduce framework. Taking as input a keyword query q and an integer k, an FCT query reports the k terms that are not in q, but appear most frequently in the results of the keyword query q over multiple joined relations. The returned terms of FCT search can be used to do query expansion and query refinement for traditional keyword search. Different from the method of FCT search in a single platform, our proposed approach can efficiently answer a FCT query using the MapReduce Paradigm without pre-computing the results of the original keyword query, which is run in parallel platform. In this work, we can output the final FCT search results by two MapReduce jobs: the first is to extract the statistical information of the data; and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Quality and Management
