Sieve-based Coreference Resolution in the Biomedical Domain
Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Esc\'arcega and, Mihai Surdeanu

TL;DR
This paper presents a sieve-based, rule-driven coreference resolution system tailored for the biomedical domain, leveraging domain knowledge to improve accuracy over general algorithms.
Contribution
It introduces a novel, rule-based sieve architecture that effectively incorporates biomedical domain knowledge for coreference resolution.
Findings
Achieved a 3.2% increase in throughput for event extraction.
Maintained high precision comparable to syntax-based systems.
Demonstrated the effectiveness of domain-specific sieves.
Abstract
We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Semantic Web and Ontologies · Topic Modeling
