Using Entropy Estimates for DAG-Based Ontologies
Andrew Warren, Joao Setubal

TL;DR
This paper introduces a new method for calculating entropy in DAG-based ontologies to improve the estimation of information content for semantic similarity, addressing limitations of traditional frequency-based approaches.
Contribution
It presents a novel entropy calculation for DAG-based ontologies and compares it with existing information content metrics using semantic and sequence similarity.
Findings
New entropy-based IC metric shows improved correlation with semantic similarity
Method outperforms traditional frequency-based IC calculations
Enhanced accuracy in gene annotation similarity assessments
Abstract
Motivation: Entropy measurements on hierarchical structures have been used in methods for information retrieval and natural language modeling. Here we explore its application to semantic similarity. By finding shared ontology terms, semantic similarity can be established between annotated genes. A common procedure for establishing semantic similarity is to calculate the descriptiveness (information content) of ontology terms and use these values to determine the similarity of annotations. Most often information content is calculated for an ontology term by analyzing its frequency in an annotation corpus. The inherent problems in using these values to model functional similarity motivates our work. Summary: We present a novel calculation for establishing the entropy of a DAG-based ontology, which can be used in an alternative method for establishing the information content of its terms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
