# Big Data for Drug Discovery: Some Historical Landscape, Considerations, and Applications for a Medicinal Chemist

**Authors:** João A. L. de Lima, Lucas Silva Franco, Lídia M. Lima

PMC · DOI: 10.1021/acsomega.5c08828 · 2026-01-26

## TL;DR

Big Data is transforming drug discovery by integrating diverse data types and AI, enabling faster and more cost-effective development of new drugs and personalized therapies.

## Contribution

The paper highlights how Big Data extends beyond ligand discovery to support novel target identification and individualized therapies in medicinal chemistry.

## Key findings

- Big Data integration with AI and high-throughput screening accelerates the identification of compounds with optimal pharmacological profiles.
- Big Data supports the identification of new pharmacological targets through genomic, proteomic, and metabolomic data integration.
- The use of Big Data in later drug development stages, including regulatory evaluation and clinical translation, is now essential.

## Abstract

Big Data (BD) has
the potential to transform the process of drug
discovery. The integration of chemical, biological, pharmacological,
and clinical information facilitates the expeditious conception of
high-value projects, thereby enhancing the identification of hits
and the generation of superior leads or repositioned candidates while
concomitantly reducing time and costs. In this review, we demonstrate
that BD extends beyond the scope of ligand discovery, thereby supporting
the identification of novel pharmacological targets through the integration
of genomic, proteomic, and metabolomic data sets. This integration
adds further depth and guides the development of individualized therapies.
When combined with combinatorial chemistry, high-throughput screening,
and artificial intelligence (AI), BD expedites the identification
of compounds that exhibit optimal pharmacokinetic and pharmacodynamic
profiles. The impact of BD extends to later stages of drug development,
including regulatory evaluation and clinical translation. This demonstrates
that BD is no longer a supplementary tool but a cornerstone for rational
molecular design, predictive modeling, and data-driven drug discovery.
Although the benefits generated by the use of BD and AI in MedChem
are evident, the impact of the widespread use of these data and tools
raises a series of philosophical questions that need to be discussed
since the popularization of large language models (LLMs) has resulted
in the generation of promiscuous data, which, from a scientific point
of view, lacks the criteria necessary for such data to be considered
meaningful. All of these factors demonstrate the need for intentional
dialogue on how these tools should be applied within the hermeneutics
of biomedical sciences themselves, in order to ensure a lucid discussion
on the nature of the method, harmonizing this apparent tension between
human and AI, which has been a source of controversy since the exponential
rise of ChatGPT and various other LLMs.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12903145/full.md

---
Source: https://tomesphere.com/paper/PMC12903145