Descartes: Generating Short Descriptions of Wikipedia Articles
Marija Sakota, Maxime Peyrard, Robert West

TL;DR
This paper introduces Descartes, a multilingual model that automatically generates short descriptions for Wikipedia articles by integrating article text, existing descriptions in other languages, and semantic type info, significantly improving coverage and quality.
Contribution
The paper presents a novel multilingual model for automatic Wikipedia short description generation, outperforming baselines and matching monolingual models in quality, with potential for real-world application.
Findings
Descartes outperforms baseline models including translation-based methods.
The model achieves human-level quality in description generation.
91.3% of Descartes's English descriptions pass Wikipedia inclusion standards.
Abstract
Wikipedia is one of the richest knowledge sources on the Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating the article's topic (e.g., the short description of beer is "Alcoholic drink made from fermented cereal grains"). Nonetheless, a large fraction of articles (ranging from 10.2% in Dutch to 99.7% in Kazakh) have no short description yet, with detrimental effects for millions of Wikipedia users. Motivated by this problem, we introduce the novel task of automatically generating short descriptions for Wikipedia articles and propose Descartes, a multilingual model for tackling it. Descartes integrates three sources of information to generate an article description in a target language: the text of the article in all its language versions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Cancer-related gene regulation
