AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs
Mo El-Haj, Paul Rayson

TL;DR
AraFinNews is a large Arabic financial news dataset that enables evaluation of domain-specific language models, demonstrating that domain adaptation improves summarisation quality, especially for quantitative and entity-related information.
Contribution
The paper introduces AraFinNews, the largest Arabic financial news dataset, and evaluates the impact of domain-adapted LLMs on financial text summarisation.
Findings
Domain-adapted models produce more coherent summaries.
Financial domain pretraining improves numerical and entity handling.
The dataset facilitates benchmarking Arabic financial NLP tasks.
Abstract
We introduce AraFinNews, the largest publicly available Arabic financial news dataset to date, comprising 212,500 article-headline pairs spanning a decade of reporting from 2015 to 2025. Designed as an Arabic counterpart to major English summarisation corpora such as CNN/DailyMail, AraFinNews provides a realistic benchmark for evaluating domain-specific language understanding and generation in financial contexts. Using this resource, we investigate the impact of domain specificity on abstractive summarisation of Arabic financial texts with large language models (LLMs). In particular, we evaluate transformer-based models: mT5, AraT5, and the domain-adapted FinAraT5 to examine how financial-domain pretraining influences accuracy, numerical reliability, and stylistic alignment with professional reporting. Experimental results show that domain-adapted models generate more coherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Sentiment Analysis and Opinion Mining
