Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing
Peiming Guo, Meishan Zhang, Jianling Li, Min Zhang, Yue Zhang

TL;DR
This paper introduces a novel method using large language models for automatic cross-domain constituency treebank generation and enhances parsing performance through contrastive learning, achieving state-of-the-art results.
Contribution
The paper proposes LLM back generation for creating cross-domain treebanks and a span-level contrastive learning strategy to improve constituency parsing.
Findings
Achieves state-of-the-art cross-domain parsing performance
Effectively generates cross-domain treebanks using LLM back generation
Contrastive learning significantly boosts parsing accuracy
Abstract
Cross-domain constituency parsing is still an unsolved challenge in computational linguistics since the available multi-domain constituency treebank is limited. We investigate automatic treebank generation by large language models (LLMs) in this paper. The performance of LLMs on constituency parsing is poor, therefore we propose a novel treebank generation method, LLM back generation, which is similar to the reverse process of constituency parsing. LLM back generation takes the incomplete cross-domain constituency tree with only domain keyword leaf nodes as input and fills the missing words to generate the cross-domain constituency treebank. Besides, we also introduce a span-level contrastive learning pre-training strategy to make full use of the LLM back generation treebank for cross-domain constituency parsing. We verify the effectiveness of our LLM back generation treebank coupled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
