Focused Contrastive Training for Test-based Constituency Analysis
Benjamin Roth, Erion \c{C}ano

TL;DR
This paper introduces a contrastive self-training method for grammaticality models in constituency analysis, leveraging syntactic tests to improve model discrimination between grammatical and ungrammatical sentences.
Contribution
It presents a novel contrastive training scheme that uses syntactic test transformations and selective positive instances to enhance constituency analysis models.
Findings
Achieved consistent improvements in grammaticality detection.
Utilized syntactic test-based perturbations for training.
Enhanced model robustness by focusing on test-relevant sentence pairs.
Abstract
We propose a scheme for self-training of grammaticality models for constituency analysis based on linguistic tests. A pre-trained language model is fine-tuned by contrastive estimation of grammatical sentences from a corpus, and ungrammatical sentences that were perturbed by a syntactic test, a transformation that is motivated by constituency theory. We show that consistent gains can be achieved if only certain positive instances are chosen for training, depending on whether they could be the result of a test transformation. This way, the positives, and negatives exhibit similar characteristics, which makes the objective more challenging for the language model, and also allows for additional markup that indicates the position of the test application within the sentence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsTest
