Empirical Evaluation of Tree distances for Parser Evaluation
Taraka Rama

TL;DR
This paper empirically compares tree distance measures from biology for parser evaluation, showing high correlation with standard metrics and analyzing their effectiveness in assessing parser accuracy.
Contribution
It introduces and evaluates tree distance measures (RF, QD, and variants) for parser evaluation, demonstrating their correlation with established metrics.
Findings
RF measure correlates with EvalB scores
Tree distances provide alternative evaluation metrics
High correlation observed between different measures
Abstract
In this empirical study, I compare various tree distance measures -- originally developed in computational biology for the purpose of tree comparison -- for the purpose of parser evaluation. I will control for the parser setting by comparing the automatically generated parse trees from the state-of-the-art parser Charniak, 2000) with the gold-standard parse trees. The article describes two different tree distance measures (RF and QD) along with its variants (GRF and GQD) for the purpose of parser evaluation. The article will argue that RF measure captures similar information as the standard EvalB metric (Sekine and Collins, 1997) and the tree edit distance (Zhang and Shasha, 1989) applied by Tsarfaty et al. (2011). Finally, the article also provides empirical evidence by reporting high correlations between the different tree distances and EvalB metric's scores.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning in Bioinformatics · Topic Modeling
