PLSUM: Generating PT-BR Wikipedia by Summarizing Multiple Websites
Andr\'e Seidel Oliveira, Anna Helena Reali Costa

TL;DR
This paper introduces PLSUM, a framework that generates Wikipedia-style summaries in Brazilian Portuguese by combining extractive and abstractive methods, utilizing Transformer models fine-tuned on a new dataset linking web sources to Wikipedia.
Contribution
The paper presents a novel framework for automatic Wikipedia content generation in Portuguese, including a new dataset and a comparison of Transformer-based models for abstractive summarization.
Findings
Transformers can generate meaningful summaries in Portuguese.
The framework effectively combines extractive and abstractive techniques.
Generated summaries improve Wikipedia content coverage.
Abstract
Wikipedia is an important free source of intelligible knowledge. Despite that, Brazilian Portuguese Wikipedia still lacks descriptions for many subjects. In an effort to expand the Brazilian Wikipedia, we contribute PLSum, a framework for generating wiki-like abstractive summaries from multiple descriptive websites. The framework has an extractive stage followed by an abstractive one. In particular, for the abstractive stage, we fine-tune and compare two recent variations of the Transformer neural network, PTT5, and Longformer. To fine-tune and evaluate the model, we created a dataset with thousands of examples, linking reference websites to Wikipedia. Our results show that it is possible to generate meaningful abstractive summaries from Brazilian Portuguese web content.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · AdamW · How do I get a human at Expedia immediately? (2025-2026) · How do I make a claim with Expedia?*Make FastClaimService · Weight Decay · How do I complain to Expedia?*ComplainByAgent · Attention Dropout
