Data-driven Coreference-based Ontology Building
Shir Ashury-Tahan, Amir David Nissan Cohen, Nadav Cohen, Yoram Louzoun, and Yoav Goldberg

TL;DR
This paper presents a data-driven method for constructing biomedical ontologies by analyzing coreference chains in a large corpus, using graph analysis to identify hierarchical and conceptual relationships.
Contribution
It introduces a novel approach that leverages coreference resolution and graph centrality to automatically build and refine biomedical ontologies from text data.
Findings
Generated a biomedical ontology with significant overlap to existing human-curated ontologies.
Demonstrated the effectiveness of betweenness centrality in identifying hierarchical relations.
Provided publicly available coreference chains and ontology data for further research.
Abstract
While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the string phrases within these chains, establishing connections between phrases if they co-occur within the same coreference chain. We then use the graph structure and the betweeness centrality measure to distinguish between edges denoting hierarchy, identity and noise, assign directionality to edges denoting hierarchy, and split nodes (strings) that correspond to multiple distinct concepts. The result is a rich, data-driven ontology over concepts in the biomedical domain, parts of which overlaps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies
MethodsSparse Evolutionary Training · Ontology
