Bootstrapping a Tagged Corpus through Combination of Existing   Heterogeneous Taggers

Jakub Zavrel; Walter Daelemans

arXiv:cs/0007018·cs.CL·May 23, 2007·24 cites

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

Jakub Zavrel, Walter Daelemans

PDF

Open Access

TL;DR

This paper introduces Combi-bootstrap, a novel method that leverages existing heterogeneous taggers and resources to efficiently annotate corpora with new tagsets, significantly improving accuracy over traditional methods.

Contribution

The paper presents a new approach, Combi-bootstrap, which effectively combines multiple existing taggers and resources for accurate corpus annotation with minimal training data.

Findings

01

Achieves up to 44.7% error reduction compared to single taggers.

02

Successfully integrates diverse resources for improved annotation accuracy.

03

Outperforms ensemble taggers trained on the same small sample.

Abstract

This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression