Information Extraction from Broadcast News
Yoshihiko Gotoh, Steve Renals

TL;DR
This paper presents statistical finite state models for extracting named entities from broadcast news speech, focusing on proper names and addressing challenges like sparse data and smoothing.
Contribution
It introduces two novel finite state models for named entity recognition in broadcast news, utilizing n-gram formulations and addressing data sparsity issues.
Findings
Models achieve effective named entity extraction in broadcast news.
Explicit class transition modeling improves recognition accuracy.
Smoothing techniques are crucial for sparse training data.
Abstract
This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub-4E evaluation for North American Broadcast News.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
