An approach to describing and analysing bulk biological annotation   quality: a case study using UniProtKB

Michael J. Bell; Colin S. Gillespie; Daniel Swan; Phillip Lord

arXiv:1208.2175·cs.CE·August 22, 2013·5 cites

An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

Michael J. Bell, Colin S. Gillespie, Daniel Swan, Phillip Lord

PDF

Open Access

TL;DR

This study investigates the quality of biological annotations in UniProtKB by analyzing word reuse patterns and applying Zipf's Law, providing a potential metric for assessing annotation reliability over time.

Contribution

It introduces a novel approach using power-law distributions of word reuse to evaluate annotation quality and distinguishes between manual and automated annotations in UniProtKB.

Findings

01

Clear trends in annotation quality over time.

02

Distinction between manual and automated annotations.

03

Potential for a generic quality assessment metric.

Abstract

Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use UniProt Knowledge Base (UniProtKB) as a case…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies