Pre and Post Counting for Scalable Statistical-Relational Model   Discovery

Richard Mar; Oliver Schulte

arXiv:2110.09767·cs.LG·October 20, 2021

Pre and Post Counting for Scalable Statistical-Relational Model Discovery

Richard Mar, Oliver Schulte

PDF

Open Access

TL;DR

This paper compares pre-counting and post-counting strategies for scalable statistical-relational model discovery, introducing a hybrid approach that balances memory and speed to handle large datasets effectively.

Contribution

It presents a novel hybrid counting method tailored for relational data, optimizing scalability by combining pre- and post-counting techniques.

Findings

01

Hybrid approach scales to millions of data facts.

02

Pre-counting benefits positive relationships.

03

Post-counting benefits negative relationships.

Abstract

Statistical-Relational Model Discovery aims to find statistically relevant patterns in relational data. For example, a relational dependency pattern may stipulate that a user's gender is associated with the gender of their friends. As with propositional (non-relational) graphical models, the major scalability bottleneck for model discovery is computing instantiation counts: the number of times a relational pattern is instantiated in a database. Previous work on propositional learning utilized pre-counting or post-counting to solve this task. This paper takes a detailed look at the memory and speed trade-offs between pre-counting and post-counting strategies for relational learning. A pre-counting approach computes and caches instantiation counts for a large set of relational patterns before model search. A post-counting approach computes an instantiation count dynamically on-demand for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Mining Algorithms and Applications · Data Management and Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings