A Dataset for Tracking Entities in Open Domain Procedural Text
Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj, Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

TL;DR
This paper introduces OPENPI, a large dataset for tracking open-vocabulary state changes in procedural text, enabling more flexible and detailed understanding of entity transformations across diverse domains.
Contribution
The paper presents a new task formulation and a high-quality dataset for tracking state changes in procedural text with open vocabulary, advancing beyond previous limited attribute sets.
Findings
Current models achieve only 16.1% F1, indicating room for improvement.
The dataset covers nearly 30,000 state changes across 4,050 sentences.
OPENPI enables research on more flexible entity state tracking.
Abstract
We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
