# Early Discovery of Emerging Entities in Microblogs

**Authors:** Satoshi Akasaki, Naoki Yoshinaga, Masashi Toyoda

arXiv: 1907.03513 · 2019-07-09

## TL;DR

This paper introduces a novel approach for early detection of truly emerging entities in microblogs, leveraging time-sensitive distant supervision to identify new entities shortly after their public appearance, outperforming existing methods.

## Contribution

The paper proposes a new task and method for discovering emerging entities in microblogs using early-stage context analysis and time-sensitive supervision, achieving high precision and early detection.

## Key findings

- Achieves 83.2% precision on top 500 emerging entities
- Detects 80.4% of Wikipedia-registered emerging entities
- Discovered entities are identified on average 571 days before Wikipedia registration

## Abstract

Keeping up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guarantee their emergence. We therefore introduce a novel task of discovering truly emerging entities when they have just been introduced to the public through microblogs and propose an effective method based on time-sensitive distant supervision, which exploits distinctive early-stage contexts of emerging entities. Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection. Besides notable emerging entities, our method can discover massive long-tail and homographic emerging entities. An evaluation of relative recall shows that the method detects 80.4% emerging entities newly registered in Wikipedia; 92.4% of them are discovered earlier than their registration in Wikipedia, and the average lead-time is more than one year (571 days).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03513/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03513/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1907.03513/full.md

---
Source: https://tomesphere.com/paper/1907.03513