Portability of Syntactic Structure for Language Modeling

Ciprian Chelba

arXiv:cs/0108022·cs.CL·May 23, 2007

Portability of Syntactic Structure for Language Modeling

Ciprian Chelba

PDF

Open Access

TL;DR

This study explores how well statistical syntactic knowledge can be transferred across domains in language modeling, showing that porting from a different domain can outperform in-domain data in reducing word error rates.

Contribution

It demonstrates that porting syntactic statistics from one domain to another can outperform traditional in-domain parsing methods in language modeling.

Findings

01

Ported SLM reduces WER by 0.4% absolute and 7% relative.

02

Ported SLM outperforms rule-based parser and manually parsed data.

03

Porting improves language model performance despite modest perplexity gains.

Abstract

The paper presents a study on the portability of statistical syntactic knowledge in the framework of the structured language model (SLM). We investigate the impact of porting SLM statistics from the Wall Street Journal (WSJ) to the Air Travel Information System (ATIS) domain. We compare this approach to applying the Microsoft rule-based parser (NLPwin) for the ATIS data and to using a small amount of data manually parsed at UPenn for gathering the intial SLM statistics. Surprisingly, despite the fact that it performs modestly in perplexity (PPL), the model initialized on WSJ parses outperforms the other initialization methods based on in-domain annotated data, achieving a significant 0.4% absolute and 7% relative reduction in word error rate (WER) over a baseline system whose word error rate is 5.8%; the improvement measured relative to the minimum WER achievable on the N-best lists we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems