Neural Article Pair Modeling for Wikipedia Sub-article Matching
Muhao Chen, Changping Meng, Gang Huang, Carlo Zaniolo

TL;DR
This paper introduces a neural model for matching Wikipedia sub-articles to their main articles, improving automated curation and knowledge extraction by addressing the fragmentation caused by article separation.
Contribution
It proposes a hierarchical neural network model with explicit features for sub-article matching, supported by a large crowdsourced dataset, outperforming previous methods.
Findings
Model achieves superior cross-validation results.
Effectively scales to entire Wikipedia for large-scale article pairing.
Outperforms previous approaches in sub-article matching accuracy.
Abstract
Nowadays, editors tend to separate different subtopics of a long Wiki-pedia article into multiple sub-articles. This separation seeks to improve human readability. However, it also has a deleterious effect on many Wikipedia-based tasks that rely on the article-as-concept assumption, which requires each entity (or concept) to be described solely by one article. This underlying assumption significantly simplifies knowledge representation and extraction, and it is vital to many existing technologies such as automated knowledge base construction, cross-lingual knowledge alignment, semantic search and data lineage of Wikipedia entities. In this paper we provide an approach to match the scattered sub-articles back to their corresponding main-articles, with the intent of facilitating automated Wikipedia curation and processing. The proposed model adopts a hierarchical learning structure that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Topic Modeling
