Detection of construction biases in biological databases: the case of miRBase
Guilherme Bicalho Saturnino, Caio Padoan de S\'a Godinho, Denise, Fagundes-Lima, Alcides Castro e Silva, Gerald Weber

TL;DR
This study investigates how construction biases influence the network topology of miRBase, revealing significant changes linked to technological advances and demonstrating that these biases are unlikely due to chance.
Contribution
The paper introduces a method to detect construction biases in biological databases by analyzing network topology changes over time, specifically applied to miRBase.
Findings
Clustering coefficient varies significantly during database growth.
Major topology change in 2009 linked to technological shift.
Simulations suggest observed changes are due to bias, not chance.
Abstract
Biological databases can be analysed as a complex network which may reveal some its underlying biological mechanisms. Frequently, such databases are identified as scale-free networks or as hierarchical networks depending on connectivity distributions or clustering coefficients. Since these databases do grow over time, one would expect that their network topology may undergo some changes. Here, we analysed the historical versions of miRBase, a database of microRNAs where we performed an alignment of all mature and precursor miRNAs and calculated a pairwise similarity index. We found that the clustering coefficient shows important changes during the growth of this database. For two consecutive versions of the year 2009 we found a strong modification of the network topology which we were able to associate to a technological change in miRNA discovery. To evaluate if these changes could have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Gene Regulatory Network Analysis
