Joint and conditional estimation of tagging and parsing models

Mark Johnson

arXiv:cs/0105012·cs.CL·May 23, 2007·3 cites

Joint and conditional estimation of tagging and parsing models

Mark Johnson

PDF

Open Access

TL;DR

This paper compares joint and conditional estimation methods for tagging and parsing models, finding that joint estimation often outperforms conditional estimation despite the latter's intuitive access to more information.

Contribution

It provides an empirical comparison showing that joint likelihood estimation can be more effective than conditional likelihood estimation for NLP models.

Findings

01

Joint estimation outperforms conditional estimation in practice.

02

Models estimated by maximizing joint likelihood are superior.

03

Conditional likelihood models do not necessarily yield better results.

Abstract

This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. However, since these applications only require the conditional probability distributions, these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to ``more information''.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · AI-based Problem Solving and Planning