Product/Brand extraction from WikiPedia
K. Massoudi, G. Modena

TL;DR
This paper presents a method for extracting product and brand pages from Wikipedia using a probabilistic classification approach, demonstrating promising results and discussing alternative methods.
Contribution
It introduces a novel probabilistic classification method for identifying product and brand pages in Wikipedia, along with an experimental setup and dataset.
Findings
The probabilistic model achieved promising accuracy in classifying product pages.
The experimental environment facilitates future research in Wikipedia page classification.
Alternative approaches were considered but the probabilistic method showed competitive results.
Abstract
In this paper we describe the task of extracting product and brand pages from wikipedia. We present an experimental environment and setup built on top of a dataset of wikipedia pages we collected. We introduce a method for recognition of product pages modelled as a boolean probabilistic classification task. We show that this approach can lead to promising results and we discuss alternative approaches we considered.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Wikis in Education and Collaboration · Natural Language Processing Techniques
