# Generating an Overview Report over Many Documents

**Authors:** Jingwen Wang, Hao Zhang, Cheng Zhang, Wenjing Yang, Liqun Shao, Jie, Wang

arXiv: 1908.06216 · 2019-08-20

## TL;DR

This paper introduces NDORGS, a novel scheme for generating comprehensive, well-structured overview reports from thousands of related documents, combining multiple NLP techniques and a multi-criteria evaluation method.

## Contribution

The paper presents NDORGS, a new integrated approach for multi-document overview report generation that addresses the lack of existing algorithms for this specific task.

## Key findings

- NDORGS effectively generates coherent, structured reports from large document sets.
- Optimal report quality achieved with SDS summaries at 20% of original document length.
- Multi-criteria evaluation confirms the method's superiority across different datasets.

## Abstract

How to efficiently generate an accurate, well-structured overview report (ORPT) over thousands of related documents is challenging. A well-structured ORPT consists of sections of multiple levels (e.g., sections and subsections). None of the existing multi-document summarization (MDS) algorithms is directed toward this task. To overcome this obstacle, we present NDORGS (Numerous Documents' Overview Report Generation Scheme) that integrates text filtering, keyword scoring, single-document summarization (SDS), topic modeling, MDS, and title generation to generate a coherent, well-structured ORPT. We then devise a multi-criteria evaluation method using techniques of text mining and multi-attribute decision making on a combination of human judgments, running time, information coverage, and topic diversity. We evaluate ORPTs generated by NDORGS on two large corpora of documents, where one is classified and the other unclassified. We show that, using Saaty's pairwise comparison 9-point scale and under TOPSIS, the ORPTs generated on SDS's with the length of 20% of the original documents are the best overall on both datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06216/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06216/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1908.06216/full.md

---
Source: https://tomesphere.com/paper/1908.06216