WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs
Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Navonil Majumder, Amir, Hussain, Lotfollah Najjar, Newton Howard, Soujanya Poria, Alexander, Gelbukh

TL;DR
WikiDes is a new dataset for generating concise Wikipedia article descriptions, employing a two-phase summarization approach with transfer and contrastive learning, significantly improving description quality and reducing human effort.
Contribution
Introduces WikiDes, a large dataset for short description generation, and proposes a two-phase summarization method with transfer and contrastive learning techniques.
Findings
T5 and BART outperform other models in description generation.
Contrastive learning with diverse inputs improves ranking performance by up to 22 ROUGE.
Generated descriptions are preferred over initial candidates in human evaluations.
Abstract
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summarization. The dataset consists of over 80k English samples on 6987 topics. We set up a two-phase summarization method - description generation (Phase I) and candidate ranking (Phase II) - as a strong approach that relies on transfer and contrastive learning. For description generation, T5 and BART show their superiority compared to other small-scale pre-trained models. By applying contrastive learning with the diverse input from beam search, the metric fusion-based ranking models outperform the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Topic Modeling
MethodsLinear Layer · Balanced Selection · Contrastive Learning · Weight Decay · Linear Warmup With Linear Decay · WordPiece · BERT · Gated Linear Unit · Inverse Square Root Schedule · Adafactor
