# One-hot news: drug synergy models shortcut molecular features

**Authors:** Emine Beyza Çandır, Halil İbrahim Kuru, Magnus Rattray, A Ercüment Çiçek, Oznur Tastan

PMC · DOI: 10.1093/bioinformatics/btag040 · Bioinformatics · 2026-01-24

## TL;DR

This paper shows that drug synergy models rely on simple identifiers rather than complex molecular features, which limits their ability to generalize.

## Contribution

The study reveals that current drug synergy models use one-hot encodings as identifiers, not molecular features, and proposes the need for better generalization strategies.

## Key findings

- Replacing drug and cell line representations with one-hot encodings yields comparable or better performance in synergy prediction models.
- Recurring drug-cell line pairs impair feature-based learning, suggesting models exploit covariation rather than true features.
- Current models are effective for prioritizing known drug pairs but struggle to generalize to new drugs or cell lines.

## Abstract

Combinatorial drug therapy holds great promise for tackling complex diseases, but the vast number of possible drug combinations makes exhaustive experimental testing infeasible. Computational models have been developed to guide experimental screens by assigning synergy scores to drug pair–cell line combinations, where they take input structural and chemical information on drugs and molecular features of cell lines. The premise of these models is that they leverage this biological and chemical information to predict synergy measurements.

In this study, we demonstrate that replacing drug and cell line representations with simple one-hot encodings results in comparable or even slightly improved performance across diverse published drug combination models. This unexpected finding suggests that current models use these representations primarily as identifiers and exploit covariation in the synergy labels. Our synthetic data experiments show that models can learn from the true features; however, when drugs and cell lines recur across drug–drug–cell triplets, this repeating structure impairs feature-based learning. While the current synergy prediction models can aid in prioritizing drug pairs within a panel of tested drugs and cell lines, our results highlight the need for better strategies to learn from intended features and to generalize to unseen drugs and cell lines.

The scripts to run the experiments are available at: https://github.com/tastanlab/ohe

## Full-text entities

- **Diseases:** LPO (MESH:D000070591), Cancer (MESH:D009369)
- **Chemicals:** LCO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Bos taurus (bovine, species) [taxon 9913]
- **Mutations:** V100S
- **Cell lines:** PC3 — Homo sapiens (Human), Prostate carcinoma, Cancer cell line (CVCL_0035), MCF7 — Homo sapiens (Human), Invasive breast carcinoma of no special type, Cancer cell line (CVCL_0031)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13005728/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13005728/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC13005728/full.md

---
Source: https://tomesphere.com/paper/PMC13005728