Neural Article Pair Modeling for Wikipedia Sub-article Matching

Muhao Chen; Changping Meng; Gang Huang; Carlo Zaniolo

arXiv:1807.11689·cs.IR·June 24, 2019·1 cites

Neural Article Pair Modeling for Wikipedia Sub-article Matching

Muhao Chen, Changping Meng, Gang Huang, Carlo Zaniolo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural model for matching Wikipedia sub-articles to their main articles, improving automated curation and knowledge extraction by addressing the fragmentation caused by article separation.

Contribution

It proposes a hierarchical neural network model with explicit features for sub-article matching, supported by a large crowdsourced dataset, outperforming previous methods.

Findings

01

Model achieves superior cross-validation results.

02

Effectively scales to entire Wikipedia for large-scale article pairing.

03

Outperforms previous approaches in sub-article matching accuracy.

Abstract

Nowadays, editors tend to separate different subtopics of a long Wiki-pedia article into multiple sub-articles. This separation seeks to improve human readability. However, it also has a deleterious effect on many Wikipedia-based tasks that rely on the article-as-concept assumption, which requires each entity (or concept) to be described solely by one article. This underlying assumption significantly simplifies knowledge representation and extraction, and it is vital to many existing technologies such as automated knowledge base construction, cross-lingual knowledge alignment, semantic search and data lineage of Wikipedia entities. In this paper we provide an approach to match the scattered sub-articles back to their corresponding main-articles, with the intent of facilitating automated Wikipedia curation and processing. The proposed model adopts a hierarchical learning structure that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muhaochen/subarticle
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Topic Modeling