Syntactically Robust Training on Partially-Observed Data for Open Information Extraction
Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu

TL;DR
This paper introduces a training framework for Open Information Extraction models that enhances syntactic robustness by using paraphrase generation and knowledge restoration techniques, validated on a new diverse dataset.
Contribution
It proposes a novel syntactically robust training method with algorithms for knowledge restoration, applicable to real-world syntactic diversity in data.
Findings
Models degrade with increasing syntactic difference.
The proposed framework maintains robustness across diverse syntactic distributions.
A new dataset CaRB-AutoPara validates the approach.
Abstract
Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
