Efficiently Inducing Features of Conditional Random Fields
Andrew McCallum

TL;DR
This paper introduces a feature induction method for Conditional Random Fields that automatically selects relevant features, improving accuracy and reducing feature count, applicable to various CRF structures and demonstrated on named entity extraction.
Contribution
It presents a novel feature induction approach for CRFs that enhances model performance and efficiency by selecting only significant feature conjunctions, adaptable to different CRF structures.
Findings
Improved accuracy over traditional methods
Significant reduction in feature count
Effective on named entity extraction task
Abstract
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of these models is their great flexibility to include a wide array of overlapping, multi-granularity, non-independent features of the input. In face of this freedom, an important question that remains is, what features should be used? This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase log-likelihood, the approach is based on that of Della Pietra et al [1997], but altered to work with conditional rather than joint probabilities, and with additional modifications for providing tractability specifically for a sequence model. In comparison with traditional approaches, automated feature induction offers both improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
