Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
Anisha Gunjal, Greg Durrett

TL;DR
This paper explores how to better verify facts in large language model outputs by defining and generating 'molecular facts' that balance independence and context, improving factual accuracy in ambiguous cases.
Contribution
It introduces the concept of molecular facts with criteria for decontextuality and minimality, and proposes a baseline method for automatic generation to enhance fact verification.
Findings
Molecular facts improve verification accuracy in ambiguous contexts.
Decontextuality and minimality are key to effective atomic fact representation.
Baseline methods effectively generate balanced molecular facts.
Abstract
Automatic factuality verification of large language model (LLM) generations is becoming more and more widely used to combat hallucinations. A major point of tension in the literature is the granularity of this fact-checking: larger chunks of text are hard to fact-check, but more atomic facts like propositions may lack context to interpret correctly. In this work, we assess the role of context in these atomic facts. We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality, or how little extra information is added to achieve decontexuality. We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information. We compare against various methods of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
