Big Data Analytics in Bioinformatics: A Machine Learning Perspective
Hirak Kashyap, Hasin Afzal Ahmed, Nazrul Hoque, Swarup Roy, Dhruba, Kumar Bhattacharyya

TL;DR
This paper reviews how machine learning and big data technologies are applied to bioinformatics, highlighting current challenges, recent advances, and future research directions for handling large, complex biological datasets.
Contribution
It provides a comprehensive overview of big data analytics methods and architectures tailored for bioinformatics, identifying gaps and proposing future research opportunities.
Findings
Parallel and incremental machine learning algorithms are increasingly used in bioinformatics.
Graph-based architectures and in-memory tools help optimize iterative processing.
There is a lack of standard big data tools for key bioinformatics problems.
Abstract
Bioinformatics research is characterized by voluminous and incremental datasets and complex data analytics methods. The machine learning methods used in bioinformatics are iterative and parallel. These methods can be scaled to handle big data using the distributed and parallel computing technologies. Usually big data tools perform computation in batch-mode and are not optimized for iterative processing and high data dependency among operations. In the recent years, parallel, incremental, and multi-view machine learning algorithms have been proposed. Similarly, graph-based architectures and in-memory big data tools have been developed to minimize I/O cost and optimize iterative processing. However, there lack standard big data architectures and tools for many important bioinformatics problems, such as fast construction of co-expression and regulatory networks and salient module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research · Bioinformatics and Genomic Networks · Gene expression and cancer classification
