sui-1: Grounded and Verifiable Long-Form Summarization
Benedikt Droste, Jan Philipp Harries, Maximilian Idahl, Bj\"orn Pl\"uster

TL;DR
sui-1 is a 24B parameter model that generates verifiable, citation-grounded summaries across multiple languages, significantly outperforming larger models by combining synthetic data and multi-stage verification.
Contribution
introduces sui-1, a model that produces traceable summaries with inline citations, leveraging a novel synthetic data pipeline and verification process for improved faithfulness.
Findings
sui-1 outperforms larger open-weight models in citation-grounded summarization
task-specific training surpasses scale in model performance
model and demo are publicly available
Abstract
Large language models frequently generate plausible but unfaithful summaries that users cannot verify against source text, a critical limitation in compliance-sensitive domains such as government and legal analysis. We present sui-1, a 24B parameter model that produces abstractive summaries with inline citations, enabling users to trace each claim to its source sentence. Our synthetic data pipeline combines chain-of-thought prompting with multi-stage verification, generating over 22,000 high-quality training examples across five languages from diverse sources including parliamentary documents, web text, and Wikipedia. Evaluation shows sui-1 significantly outperforms all tested open-weight baselines, including models with 3x more parameters. These results demonstrate that task-specific training substantially outperforms scale alone for citation-grounded summarization. Model weights and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Text Readability and Simplification
