OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation
Lucas Fonseca Lage, Simon Ostermann

TL;DR
OpenFActScore is an open-source framework that evaluates the factual accuracy of text generated by large language models using atomic fact extraction and validation, promoting transparency and reproducibility.
Contribution
It adapts the FActScore framework to support open-source models, enabling broader access and reproducibility in factuality evaluation of LLM outputs.
Findings
Open models can approximate closed-source system performance.
Gemma achieved the best overall performance among open models.
Final setup correlates highly (0.99 Pearson) with original FActScore results.
Abstract
We introduce OpenFActScore, an open-source implementation of the FActScore framework for evaluating the factuality of text generated by large language models (LLMs). FActScore evaluates the factual accuracy of long-form text by using Atomic Fact Generation (AFG) to extract individual factual claims and Atomic Fact Validation (AFV) to verify each claim against a trusted knowledge source. While the original FActScore relies on closed-source and commercial models such as InstructGPT and ChatGPT, OpenFActScore enables the use of any Hugging Face-compatible model for both AFG and AFV. We provide a detailed technical overview of our implementation, highlighting design choices and modifications made to support open models. We evaluate multiple open-source LLMs on both AFG and AFV using the original FActScore benchmark, reporting BERTScore-F1 for AFG and Error Rate relative to human annotations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Artificial Intelligence in Healthcare and Education · Topic Modeling
