Information Extraction from Broadcast News

Yoshihiko Gotoh; Steve Renals

arXiv:cs/0003084·cs.CL·October 31, 2009

Information Extraction from Broadcast News

Yoshihiko Gotoh, Steve Renals

PDF

TL;DR

This paper presents statistical finite state models for extracting named entities from broadcast news speech, focusing on proper names and addressing challenges like sparse data and smoothing.

Contribution

It introduces two novel finite state models for named entity recognition in broadcast news, utilizing n-gram formulations and addressing data sparsity issues.

Findings

01

Models achieve effective named entity extraction in broadcast news.

02

Explicit class transition modeling improves recognition accuracy.

03

Smoothing techniques are crucial for sparse training data.

Abstract

This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub-4E evaluation for North American Broadcast News.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.