Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing

Peiming Guo; Meishan Zhang; Jianling Li; Min Zhang; Yue Zhang

arXiv:2505.20976·cs.CL·May 28, 2025

Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing

Peiming Guo, Meishan Zhang, Jianling Li, Min Zhang, Yue Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel method using large language models for automatic cross-domain constituency treebank generation and enhances parsing performance through contrastive learning, achieving state-of-the-art results.

Contribution

The paper proposes LLM back generation for creating cross-domain treebanks and a span-level contrastive learning strategy to improve constituency parsing.

Findings

01

Achieves state-of-the-art cross-domain parsing performance

02

Effectively generates cross-domain treebanks using LLM back generation

03

Contrastive learning significantly boosts parsing accuracy

Abstract

Cross-domain constituency parsing is still an unsolved challenge in computational linguistics since the available multi-domain constituency treebank is limited. We investigate automatic treebank generation by large language models (LLMs) in this paper. The performance of LLMs on constituency parsing is poor, therefore we propose a novel treebank generation method, LLM back generation, which is similar to the reverse process of constituency parsing. LLM back generation takes the incomplete cross-domain constituency tree with only domain keyword leaf nodes as input and fills the missing words to generate the cross-domain constituency treebank. Besides, we also introduce a span-level contrastive learning pre-training strategy to make full use of the LLM back generation treebank for cross-domain constituency parsing. We verify the effectiveness of our LLM back generation treebank coupled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies