Towards Probabilistic Question Answering Over Tabular Data
Chen Shen, Sajjadur Rahman, Estevam Hruschka

TL;DR
This paper introduces a new benchmark and framework for probabilistic question answering over large tabular data, combining Bayesian Networks and large language models to improve reasoning under uncertainty.
Contribution
It presents a novel approach that induces Bayesian Networks from tables and translates natural language queries into probabilistic queries, advancing probabilistic QA methods.
Findings
Significant performance improvements over baselines
Effective hybrid symbolic-neural reasoning approach
Demonstrated benefits of probabilistic reasoning in QA
Abstract
Current approaches for question answering (QA) over tabular data, such as NL2SQL systems, perform well for factual questions where answers are directly retrieved from tables. However, they fall short on probabilistic questions requiring reasoning under uncertainty. In this paper, we introduce a new benchmark LUCARIO and a framework for probabilistic QA over large tabular data. Our method induces Bayesian Networks from tables, translates natural language queries into probabilistic queries, and uses large language models (LLMs) to generate final answers. Empirical results demonstrate significant improvements over baselines, highlighting the benefits of hybrid symbolic-neural reasoning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Management and Algorithms · Semantic Web and Ontologies
