Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
Feras Saad, Leonardo Casarsa, Vikash Mansinghka

TL;DR
This paper presents a probabilistic search method for structured data using probabilistic programming and nonparametric Bayes, enabling more relevant search results in complex databases without extensive domain knowledge.
Contribution
It introduces a novel search approach combining probabilistic programming, Bayesian modeling, and information theoretic ranking within a flexible platform for diverse data retrieval tasks.
Findings
Human evaluators preferred probabilistic search results over baselines.
The method effectively retrieves relevant data in various real-world databases.
Fast sparse matrix algorithms enable efficient computation of predictive relevance.
Abstract
Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Bayesian Modeling and Causal Inference · Bayesian Methods and Mixture Models
