Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing
Xu Sun, Weiwei Sun, Shuming Ma, Xuancheng Ren, Yi Zhang, Wenjie Li,, Houfeng Wang

TL;DR
This paper introduces a structure regularization decoding method to reduce overfitting in complex structured models for NLP, improving performance on sequence labeling and parsing tasks.
Contribution
It proposes a novel structure regularization decoding approach that leverages simple models to regularize complex models, backed by theoretical analysis and empirical validation.
Findings
Significant F1 error reduction (36.4%) on sequence labeling.
Maximum UAS improvement of 5.5% on parsing.
Method outperforms or matches state-of-the-art results.
Abstract
Recent systems on structured prediction focus on increasing the level of structural dependencies within the model. However, our study suggests that complex structures entail high overfitting risks. To control the structure-based overfitting, we propose to conduct structure regularization decoding (SR decoding). The decoding of the complex structure model is regularized by the additionally trained simple structure model. We theoretically analyze the quantitative relations between the structural complexity and the overfitting risk. The analysis shows that complex structure models are prone to the structure-based overfitting. Empirical evaluations show that the proposed method improves the performance of the complex structure models by reducing the structure-based overfitting. On the sequence labeling tasks, the proposed method substantially improves the performance of the complex neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics
