Effective Use of Transformer Networks for Entity Tracking
Aditya Gupta, Greg Durrett

TL;DR
This paper investigates how pre-trained transformer networks can be adapted for entity tracking in procedural texts, revealing that input restructuring improves performance but models still struggle with complex process understanding.
Contribution
It demonstrates that restructuring input guides transformers to better focus on entities, achieving state-of-the-art results in recipe and scientific process tasks.
Findings
Restructuring input improves transformer focus on entities.
Transformers achieve state-of-the-art on recipe and scientific tasks.
Models mainly rely on shallow context clues, not complex representations.
Abstract
Tracking entities in procedural language requires understanding the transformations arising from actions on entities as well as those entities' interactions. While self-attention-based pre-trained language encoders like GPT and BERT have been successfully applied across a range of natural language understanding tasks, their ability to handle the nuances of procedural texts is still untested. In this paper, we explore the use of pre-trained transformer networks for entity tracking tasks in procedural text. First, we test standard lightweight approaches for prediction with pre-trained transformers, and find that these approaches underperform even simple baselines. We show that much stronger results can be attained by restructuring the input to guide the transformer model to focus on a particular entity. Second, we assess the degree to which transformer networks capture the process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Dense Connections
