Open4Business(O4B): An Open Access Dataset for Summarizing Business Documents
Amanpreet Singh, Niranjan Balasubramanian

TL;DR
Open4Business (O4B) is a large open access dataset of 17,458 business articles with reference summaries, designed to advance domain-specific summarization research and evaluate model performance in the business context.
Contribution
This work introduces the first large-scale open access dataset for business document summarization, enabling domain-specific model training and evaluation.
Findings
Models trained on O4B perform comparably to those trained on larger non-open access datasets.
O4B presents a new challenge requiring highly abstractive and concise summaries.
The dataset and code are publicly released for further research.
Abstract
A major challenge in fine-tuning deep learning models for automatic summarization is the need for large domain specific datasets. One of the barriers to curating such data from resources like online publications is navigating the license regulations applicable to their re-use, especially for commercial purposes. As a result, despite the availability of several business journals there are no large scale datasets for summarizing business documents. In this work, we introduce Open4Business(O4B),a dataset of 17,458 open access business articles and their reference summaries. The dataset introduces a new challenge for summarization in the business domain, requiring highly abstractive and more concise summaries as compared to other existing datasets. Additionally, we evaluate existing models on it and consequently show that models trained on O4B and a 7x larger non-open access dataset achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
