Revisiting Supertagging for Faster HPSG Pasing

Olga Zamaraeva; Carlos G\'omez-Rodr\'iguez

arXiv:2309.07590·cs.CL·October 10, 2024

Revisiting Supertagging for Faster HPSG Pasing

Olga Zamaraeva, Carlos G\'omez-Rodr\'iguez

PDF

Open Access

TL;DR

This paper introduces new supertaggers trained on high-quality HPSG treebanks, demonstrating significant improvements in parsing speed and accuracy using SVM and BERT-based models across diverse datasets.

Contribution

It presents novel supertagging models for HPSG parsing, including fine-tuned BERT, and evaluates their impact on parsing efficiency and accuracy on challenging datasets.

Findings

01

BERT-based supertagger achieves 97.26% accuracy on WSJ dataset.

02

Supertagging speeds up parsing by a factor of 3.

03

New datasets are provided for broader evaluation.

Abstract

We present new supertaggers trained on English grammar-based treebanks and test the effects of the best tagger on parsing speed and accuracy. The treebanks are produced automatically by large manually built grammars and feature high-quality annotation based on a well-developed linguistic theory (HPSG). The English Resource Grammar treebanks include diverse and challenging test datasets, beyond the usual WSJ section 23 and Wikipedia data. HPSG supertagging has previously relied on MaxEnt-based models. We use SVM and neural CRF- and BERT-based methods and show that both SVM and neural supertaggers achieve considerably higher accuracy compared to the baseline and lead to an increase not only in the parsing speed but also the parser accuracy with respect to gold dependency structures. Our fine-tuned BERT-based tagger achieves 97.26\% accuracy on 950 sentences from WSJ23 and 93.88% on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSupport Vector Machine