Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal "that" in Noun Complement Clauses vs. Relative Clauses
Zineddine Tighidet, Nicolas Ballier

TL;DR
This paper explores methods to distinguish between relative and noun complement clauses in English, focusing on the use of probabilistic decision trees and corpus relabeling to improve parsing accuracy.
Contribution
It introduces a novel application of TreeTagger to differentiate postnominal "that" in relative and complement clauses, analyzing training set effects and corpus representativeness.
Findings
TreeTagger effectively learns the distinction with sufficient training data
Relabeling corpus improves parsing accuracy
GUM Treebank files are somewhat representative of the structures studied
Abstract
In this paper we investigated two different methods to parse relative and noun complement clauses in English and resorted to distinct tags for their corresponding that as a relative pronoun and as a complementizer. We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency. Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that". We investigated the effect of the training set size on TreeTagger accuracy and how representative the GUM Treebank files are for the two structures under scrutiny. We discussed some of the linguistic and structural tenets of the learnability of this distinction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
