Larger-Context Tagging: When and Why Does It Work?
Jinlan Fu, Liangjing Feng, Qi Zhang, Xuanjing Huang, Pengfei Liu

TL;DR
This paper investigates the conditions under which larger-context training improves sentence-level tagging systems, analyzing different aggregation methods and providing interpretability to guide future use of contextual information.
Contribution
It offers a comprehensive study on when and why larger-context training works, including a comparative analysis of aggregation methods and an interpretability framework.
Findings
Larger-context training benefits depend on specific task attributes.
Different aggregation methods vary in effectiveness across datasets.
The interpretability method clarifies when larger-context improves performance.
Abstract
The development of neural networks and pretraining techniques has spawned many sentence-level tagging systems that achieved superior performance on typical benchmarks. However, a relatively less discussed topic is what if more context information is introduced into current top-scoring tagging systems. Although several existing works have attempted to shift tagging systems from sentence-level to document-level, there is still no consensus conclusion about when and why it works, which limits the applicability of the larger-context approach in tagging tasks. In this paper, instead of pursuing a state-of-the-art tagging system by architectural exploration, we focus on investigating when and why the larger-context training, as a general strategy, can work. To this end, we conduct a thorough comparative study on four proposed aggregators for context information collecting and present an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
