Occurrence Statistics of Entities, Relations and Types on the Web

Aman Madaan; Sunita Sarawagi

arXiv:1605.04359·cs.CL·May 17, 2016

Occurrence Statistics of Entities, Relations and Types on the Web

Aman Madaan, Sunita Sarawagi

PDF

Open Access

TL;DR

This paper discusses the challenge of estimating entity occurrences on the web due to distribution mismatches and proposes using maximum mean discrepancy for better estimation, reviewing related disambiguation techniques.

Contribution

It introduces the application of maximum mean discrepancy to improve occurrence statistics estimation of entities on the web, addressing distribution mismatch issues.

Findings

01

Maximum mean discrepancy effectively estimates entity occurrence statistics.

02

Web entity distributions differ significantly from training data.

03

Review of named entity disambiguation techniques.

Abstract

The problem of collecting reliable estimates of occurrence of entities on the open web forms the premise for this report. The models learned for tagging entities cannot be expected to perform well when deployed on the web. This is owing to the severe mismatch in the distributions of such entities on the web and in the relatively diminutive training data. In this report, we build up the case for maximum mean discrepancy for estimation of occurrence statistics of entities on the web, taking a review of named entity disambiguation techniques and related concepts along the way.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Topic Modeling · Web Data Mining and Analysis