The Choice of Knowledge Base in Automated Claim Checking
Dominik Stammbach, Boya Zhang, Elliott Ash

TL;DR
This paper investigates how the choice of knowledge base affects automated claim checking, revealing that domain overlap influences accuracy and that combining multiple knowledge bases offers limited benefits.
Contribution
It demonstrates that claim-checking pipelines can transfer across domains, highlights the importance of domain overlap, and introduces confidence scores for knowledge base suitability assessment.
Findings
Higher domain overlap improves accuracy.
Combining multiple knowledge bases offers limited gains.
Confidence scores can predict knowledge base performance.
Abstract
Automated claim checking is the task of determining the veracity of a claim given evidence found in a knowledge base of trustworthy facts. While previous work has taken the knowledge base as given and optimized the claim-checking pipeline, we take the opposite approach - taking the pipeline as given, we explore the choice of knowledge base. Our first insight is that a claim-checking pipeline can be transferred to a new domain of claims with access to a knowledge base from the new domain. Second, we do not find a "universally best" knowledge base - higher domain overlap of a task dataset and a knowledge base tends to produce better label accuracy. Third, combining multiple knowledge bases does not tend to improve performance beyond using the closest-domain knowledge base. Finally, we show that the claim-checking pipeline's confidence score for selecting evidence can be used to assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
