VeriFastScore: Speeding up long-form factuality evaluation
Rishanth Rajendhran, Amir Zadeh, Matthew Sarte, Chuan Li, Mohit Iyyer

TL;DR
VeriFastScore is a fine-tuned Llama3.1 model that efficiently evaluates long-form factuality by extracting and verifying claims simultaneously, achieving significant speedups while maintaining high correlation with traditional methods.
Contribution
We introduce VeriFastScore, a novel fine-tuned model that speeds up long-form factuality evaluation by handling multiple claims at once, reducing reliance on numerous LLM calls.
Findings
Achieves 6.6x speedup over VeriScore
Correlates strongly with VeriScore (r=0.80 example, 0.94 system)
Can process complex evidence with ~4K tokens
Abstract
Metrics like FactScore and VeriScore that evaluate long-form factuality operate by decomposing an input response into atomic claims and then individually verifying each claim. While effective and interpretable, these methods incur numerous LLM calls and can take upwards of 100 seconds to evaluate a single response, limiting their practicality in large-scale evaluation and training scenarios. To address this, we propose VeriFastScore, which leverages synthetic data to fine-tune Llama3.1 8B for simultaneously extracting and verifying all verifiable claims within a given text based on evidence from Google Search. We show that this task cannot be solved via few-shot prompting with closed LLMs due to its complexity: the model receives ~4K tokens of evidence on average and needs to concurrently decompose claims, judge their verifiability, and verify them against noisy evidence. However, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Explainable Artificial Intelligence (XAI)
