Product/Brand extraction from WikiPedia

K. Massoudi; G. Modena

arXiv:1212.3013·cs.IR·December 14, 2012

Product/Brand extraction from WikiPedia

K. Massoudi, G. Modena

PDF

Open Access

TL;DR

This paper presents a method for extracting product and brand pages from Wikipedia using a probabilistic classification approach, demonstrating promising results and discussing alternative methods.

Contribution

It introduces a novel probabilistic classification method for identifying product and brand pages in Wikipedia, along with an experimental setup and dataset.

Findings

01

The probabilistic model achieved promising accuracy in classifying product pages.

02

The experimental environment facilitates future research in Wikipedia page classification.

03

Alternative approaches were considered but the probabilistic method showed competitive results.

Abstract

In this paper we describe the task of extracting product and brand pages from wikipedia. We present an experimental environment and setup built on top of a dataset of wikipedia pages we collected. We introduce a method for recognition of product pages modelled as a boolean probabilistic classification task. We show that this approach can lead to promising results and we discuss alternative approaches we considered.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Wikis in Education and Collaboration · Natural Language Processing Techniques