Joint and conditional estimation of tagging and parsing models
Mark Johnson

TL;DR
This paper compares joint and conditional estimation methods for tagging and parsing models, finding that joint estimation often outperforms conditional estimation despite the latter's intuitive access to more information.
Contribution
It provides an empirical comparison showing that joint likelihood estimation can be more effective than conditional likelihood estimation for NLP models.
Findings
Joint estimation outperforms conditional estimation in practice.
Models estimated by maximizing joint likelihood are superior.
Conditional likelihood models do not necessarily yield better results.
Abstract
This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. However, since these applications only require the conditional probability distributions, these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to ``more information''.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · AI-based Problem Solving and Planning
