ArcDFI: Attention regularization guided by CYP450 interactions for predicting drug-food interactions

Mogan Gim; Jaewoo Kang; Donghyeon Park; Minji Jeon

PMC · DOI:10.1371/journal.pcbi.1013055·October 15, 2025

ArcDFI: Attention regularization guided by CYP450 interactions for predicting drug-food interactions

Mogan Gim, Jaewoo Kang, Donghyeon Park, Minji Jeon

PDF

Open Access

TL;DR

ArcDFI is a new AI model that predicts how drugs interact with food by considering liver enzymes called CYP450, improving prediction accuracy and explainability.

Contribution

ArcDFI is the first model to incorporate CYP450-drug interactions for predicting drug-food interactions, enhancing generalizability and model explainability.

Findings

01

ArcDFI outperforms ten baseline models in predicting drug-food interactions under cold-drug and cold-food settings.

02

The model's attention mechanism reveals molecular features linked to drug-CYP450 interactions and DFI predictions.

03

Incorporating CYP450 data improves predictive generalizability and provides insights into drug-food interaction mechanisms.

Abstract

CYP450 isoenzymes are known to be deeply involved in the formation of drug-food interactions (DFI). Previously introduced computational approaches for predicting DFIs do not take drug-CYP450 interactions (DCI) into account and have limited generalizability in handling compounds unseen during model training. We introduce ArcDFI, a model that utilizes attention regularization guided by CYP450 interactions to predict drug-food interactions. Experiments on DFI prediction—evaluated under stringent cold-drug and cold-food settings—show that our model outperforms ten baseline approaches, demonstrating the effectiveness of incorporating CYP450 interactions. Analysis of its attention mechanism provides insight into its current understanding of DCI and how they are related to its DFI predictions. To the best of our knowledge, ArcDFI is the first DFI prediction model that incorporates the concept…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes9

CYP1A2 CYP2C9 CYP2E1 Ppig CYP3A4 CYP4F3 PPIG CYP2C19 CYP2D6

Proteins9

Species2

Homo sapiens(human · species)Citrus x paradisi(grapefruit · species)

Chemicals11

O Salicylate Berberine warfarin hydrogen calcium Sulfonylurea vitamin K ArcDFI tetracycline Midazolam

Diseases7

diabetes DCI inflammatory DFI cancer Auxiliary Loss toxicity

Figures5

Click any figure to enlarge with its caption.

Fig 1 — Overview of ArcDFI.1) This image was obtained from https://commons.wikimedia.org/wiki/File:Capsule_icon.svg. 2) This image was obtained from https://commons.wikimedia.org/wiki/File:Noun-drugs-1511305-00449F.svg. 3) This image was was obtained from https://f1000research.com/articles/4-178 and is licensed under Creative Commons Attribution License, https://doi.org/10.12688/f1000research.6314.1.

Fig 2 — Descriptive illustration of ArcDFI.a) Model architecture for ArcDFI. The parameters in Compound Substructure Encoder, Compound Graph Encoder and Compound-CYP Interaction Block are shared by both drug and food compounds. b) Detailed illustration of the Compound-CYP interaction block. c) Detailed illustration of the Cross-Modality Fusion Layer. The drug (food) compound-CYP interaction embedding is combined with the food (drug) compound graph embedding using a vector-wise outer product, followed by concatenation of the two embeddings.

Fig 3 — Detailed illustration for Attention Regularization Auxiliary Loss Objective.

Fig 4 — Analysis on ArcDFI’s Compound-CYP Interaction Block for drug-food compound pair Sulfonylurea and Salicylate.(a), (b): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Sulfonylurea, respectively. (c), (d): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Sulfonylurea, respectively. (e), (f): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Salicylate, respectively. (g), (h): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Salicylate, respectively.

Fig 5 — Analysis on ArcDFI’s Compound-CYP Interaction Block for drug-food compound pair Midazolam and Berberine.(a), (b): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Midazolam, respectively. (c), (d): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Midazolam, respectively. (e), (f): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Berberine, respectively. (g), (h): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Berberine, respectively.

Equations21

Funding8

—Hankuk University of Foreign Studies Research Fund (of 2024)
—Korea Bio Data Station(K-BDS)
—Korean Ministry of Education (MOE)
—Seoul Metropolitan Government
—National Research Foundation of Korea
—MSIT under the ICAN program supervised by the IITP
—Drug Discovery R&D Center
—Advanced Biomedical Engineering Research Group [O2515711] at Korea University Anam Hospital

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies · Pharmacogenetics and Drug Metabolism

Full text

Introduction

Individuals commonly use medications to maintain health or treat diseases, but drug efficacy can be significantly influenced by dietary factors. Food components may alter drug absorption, metabolism, and excretion, potentially causing unexpected side effects, known as drug-food interactions (DFIs). DFIs occur via three primary mechanisms: incompatibility, pharmacokinetics (PK), or pharmacodynamics (PD) [1]. Incompatibility involves physical or chemical interactions affecting drug stability or bioavailability—e.g., tetracycline binds with calcium in dairy, reducing absorption [2]. PK interactions affect absorption, metabolism, distribution, or excretion; for example, grapefruit juice inhibits CYP3A4, increasing blood levels and toxicity risks of statins [3]. PD interactions modify drug effects at target receptors, enhancing or counteracting intended outcomes, as seen with vitamin K-rich foods diminishing warfarin’s anticoagulant effect [4]. Understanding and managing DFIs is thus essential for optimizing drug therapy and ensuring patient safety.

Identifying novel DFIs is critical not only clinically but also during drug development, where early detection can improve safety profiles and accelerate approval. However, experimental identification remains difficult due to the vast number of possible drug-food pairs and complex biochemical mechanisms [5]. Therefore, computational models have emerged as scalable alternatives. DFinder, for instance, uses a graph-based method integrating compound features and topological information from a large heterogeneous graph of drugs and food [5]. DFI-MS introduces a multilevel feature optimization and contrastive learning framework [6]. While promising, both models struggle with generalization to unseen drugs or food compounds. DFinder’s reliance on a fixed interaction network and DFI-MS’s a compound-wise embedding layer with a fixed set of representations hinder their ability to predict interactions for novel inputs—a critical challenge for real-world applications, particularly with new therapeutics. These limitations underscore the need for more flexible, generalizable representation approaches.

Among DFI mechanisms, cytochrome P450 (CYP450)-mediated interactions are especially important, as these enzymes metabolize 75–80% of clinical drugs [7], and account for 60–70% of DFIs [8,9]. For example, grapefruit juice inhibits CYP3A4, increasing plasma levels of statins and antidepressants [10], while St. John’s Wort induces it, reducing efficacy of medications such as contraceptives and immunosuppressants [11]. These examples highlight the importance of incorporating CYP450 mechanisms when studying DFIs [12].

Despite CYP450’s significance, no prior computational models have explicitly used CYP450-related information for DFI prediction. This gap stems from sparse annotations for compound-CYP450 interactions. While databases such as DrugBank [13] and the Flockhart Table [14] provide some data on drug-CYP450 interactions (e.g., substrates, inhibitors), coverage remains limited, particularly for food compounds. Consequently, incorporating this critical information into existing models has been challenging. Developing sparsity-resilient models that leverage known CYP450 annotations while uncovering novel interactions is thus an important research direction.

Cross-attention mechanisms are popular in deep learning for modeling inter-modal relationships and enhancing interpretability through attention scores [15]. These scores, based on embedding similarity between query and key elements, highlight important interactions. However, attention scores can be ambiguous when no relevant key exists to a given query element, undermining interpretability [16]. ArkDTA, a drug-target interaction model, mitigates this by using attention regularization: introducing a surrogate pseudo-embedding that absorbs attention when no explicit relationship exists in labeled data [17].

We extend this approach to DFI prediction using CYP450 information. Here, ground-truth labels (compound-CYP450 interactions) guide attention regularization. The pseudo-embedding absorbs attention if a compound lacks known interaction with the CYP450 isoenzymes, improving both prediction and interpretability. Unlike ArkDTA, however, our study faces a highly sparse set of ground truth labels for CYP450 interactions, particularly for food compounds. To address this, we apply semi-supervised learning, using attention regularization only where DCI annotations exist and defaulting to standard unsupervised cross-attention otherwise. As a result, our model can predict DFIs for novel compounds while attributing interactions to specific CYP450 isoenzymes.

In this paper, we present Attention Regularization guided by CYP450 Interactions for predicting Drug- Food Interactions (ArcDFI), a novel deep-learning framework that robustly predicts DFIs and enhances interpretability via CYP450-based attention mechanisms (Fig 1). Our key contributions are:

Inspired by ArkDTA [17], we propose ArcDFI that employs attention regularization, which modulates its attention mechanism based on available information regarding drug-CYP450 interactions (DCIs).We demonstrate ArcDFI’s generalizability in predicting DFI interactions given unseen drugs or food compounds based on its experimental results.Utilizing the attention mechanism, we demonstrate ArcDFI’s explainability by finding which compound substructures are relevant in predicting DFIS and implicating which CYP450 isoenzymes are likely to form DCIs with the compound.

Overview of ArcDFI.1) This image was obtained from https://commons.wikimedia.org/wiki/File:Capsule_icon.svg. 2) This image was obtained from https://commons.wikimedia.org/wiki/File:Noun-drugs-1511305-00449F.svg. 3) This image was was obtained from https://f1000research.com/articles/4-178 and is licensed under Creative Commons Attribution License, https://doi.org/10.12688/f1000research.6314.1.

Materials and methods

DFI dataset construction

The main data sources for constructing the DFI dataset were FooDrugs [18] and FDMine [19]. The latest version of the FooDrugs dataset (v4) contains over 500,000 DFIs collected from textual documents using natural language processing techniques, and inferred from gene expression data using similarity profile analysis. The FDMine dataset contains binary-labeled pairwise interactions for a unique number of 787 drugs and 563 food compounds. It is a comprehensive dataset built from two large-scale data sources: DrugBank [13] and Food Database (FooDB [20]). Note that food compounds differ from food items as the former are chemical substances, while the latter are composites of food compounds. While the data sources also contain food item information associated with food compounds, our work’s main focus lies in predicting interactions between drug compounds and food compounds. Therefore, we only collected such compound-level relationship data from these data sources during the construction of our large-scale DFI dataset.

After eliminating DFIs that contain invalid compounds, we integrated both data sources to construct a large-scale DFI prediction dataset denoted as ArcDFI dataset. Positive labels indicate that a drug-food compound pair forms a metabolism-related interaction, while negative labels indicate its absence. Table 1 shows the overall statistics for the three datasets including ours.

Table 1: Detailed statistics for each dataset where our newly constructed ArcDFI dataset is an integration of FooDrugs and FDMine.

Drug-CYP450 interaction label annotation

Our approach centralizes the modeling of CYP450-mediated DFIs, which requires CYP450 interaction features associated with both drug and food compounds. To augment our DFI dataset with CYP450-related information, we first gathered drug compounds annotated with CYP450 interaction labels from three data sources. The DrugBank database [13] and Drug Interactions Flockhart Table [14] provide the drug-CYP450 interactions (DCI) involving CYP450 isoenzymes and their drug interaction types. In addition, [21] released a CYP450 interaction dataset that contains both positive (presence of DCI) and negative (absence of DCI) labels for each interaction type.

After integrating the three data sources, we constructed a drug-CYP450 interaction (DCI) dataset involving five CYP450 isoenzymes (CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4), and two interaction types (substrate and inhibition). While there are other CYP450 isoenzymes (i.e., CYP2E1) and interaction types (i.e., induction), we only selected those that had relatively higher availability. The DCI dataset was used to annotate the drug compounds contained in the DFI dataset using their corresponding CYP450 interaction labels. Table 2 shows the number of drug compounds annotated for each CYP450 isoenzyme interaction type.

Table 2: Number of drug compounds annotated with each type of CYP450 isoenyzme (1A2, 3A4, 2C19, 2C9, 2D6) interaction (substrate, inhibition) label.

Model architecture for ArcDFI

As shown in Fig 2a, ArcDFI comprises the following components: Compound substructure encoder, Compound graph encoder, CYP450 encoder, Compound-CYP interaction block, Cross-modality fusion layer and Drug-food interaction prediction layer. It predicts a drug-food interaction score given both SMILES-represented compounds as input. Throughout this paper, the term "compound" is collectively used to refer to both drug compounds and food compounds.

Descriptive illustration of ArcDFI.a) Model architecture for ArcDFI. The parameters in Compound Substructure Encoder, Compound Graph Encoder and Compound-CYP Interaction Block are shared by both drug and food compounds. b) Detailed illustration of the Compound-CYP interaction block. c) Detailed illustration of the Cross-Modality Fusion Layer. The drug (food) compound-CYP interaction embedding is combined with the food (drug) compound graph embedding using a vector-wise outer product, followed by concatenation of the two embeddings.

Our architectural design aspects for implementing ArcDFI are the following,

Parameter-efficient Model Design: As DFI prediction involves two types of compound-level data entities (drug and food compounds), we incorporated weight-sharing strategies in the model architectural design of ArcDFI. Specifically, both compound types are propagated through shared Compound Encoder modules and a shared CYP-Compound Interaction block. This design reduces the number of trainable parameters and helps mitigate overfitting, particularly under limited supervision for food compound interactions.Substructure-based Representation: We adopted ECFPs as the primary molecular representation due to their proven utility in capturing meaningful chemical substructures. ECFPs are robust, interpretable, and easily derived from canonical SMILES, making them well-suited for substructure-level attention modeling. We focused attention on substructure–enzyme interactions, as previous studies (including ArkDTA [17]) suggest these provide more interpretable and biologically meaningful insights than atom-level interactions.Attention Regularization guided by CYP450 Interactions: To improve interpretability and inject biochemical prior knowledge, we adapted the attention regularization technique proposed in ArkDTA. While ArkDTA employed a single trainable pseudo-embedding for regularization, our method empirically benefited from using multiple pseudo-embeddings. Additionally, we divided the multi-head attention heads by the following functionalities: one head is regularized for CYP450 inhibition signals, another for substrate signals, while the remaining heads are left unregularized to retain representational flexibility.

List of mathematical notations.

For better understanding, we provide a list of mathematical notations base below.

d Drug Compoundf Food Compound $[eqn]$ Chemical Substructures of a Compound $[eqn]$ Molecular Graph Representation of a Compound $[eqn]$ CYP450 Iso Enzymes $[eqn]$ Trainable Pseudo-substructures $[eqn]$ Initial Set of $[eqn]$ Embeddings $[eqn]$ Initial Compound Graph Representation $[eqn]$ Set of 5 Amino-Acid Sequences of $[eqn]$ $[eqn]$ Set of Encoded Chemical Substructure Embeddings for a Compound $[eqn]$ Molecular Graph Embedding for a Compound $[eqn]$ Set of 5 Refined CYP450 Iso enzyme Embeddings $[eqn]$ Set of Pseudo-Substructure Embeddings $[eqn]$ Drug Compound- Integrated CYP450 Embedding $[eqn]$ Food Compound- Integrated CYP450 Embedding $[eqn]$ CYP450-mediated Drug-Food Fusion Embedding $[eqn]$ Batch of DFI Score Predictions $[eqn]$ Batch of Ground Truth DFI Labels $[eqn]$ Attention Weight Score calculated between the jth CYP450 Isoenzyme and kth Compound Substructure in the ith Attention Head $[eqn]$ Batch of CYP450 Interaction Score Predictions $[eqn]$ Batch of Ground Truth CYP450 Interaction Labels

Compound substructure encoder.

The Compound substructure encoder first takes a SMILES- represented compound as input and converts it into 1024-dimensional Extended Connectivity Fingerprints (ECFPs) with a radius set to 2. Because each bit position of its binary ECFP represents the presence of its substructure detected within a certain radius, we utilized this information by converting each compound into a set of 2h-dimensional trainable substructure embeddings [17]. These initial substructure embeddings corresponding to the drug and food compounds were then propagated through an MLP, resulting in a set of encoded nd and nf chemical substructure embeddings, respectively, where d and f represent a drug compound and a food compound respectively ( $[eqn]$ , $[eqn]$ ).

The Compound substructure encoder that takes the ECFPs from SMILES representation of drugs and food compounds as inputs is mathematically expressed as follows:

[eqn]

[eqn]

where $[eqn]$ is an initial set of 2h-dimensional chemical substructure embeddings converted from the compound’s ECFP representation. $[eqn]$ comprises a linear transformation layer ( $[eqn]$ ) followed by non-linear activation function $[eqn]$ and $[eqn]$ layer whose dropout rate is globally set to 0.3.

Compound graph encoder.

The Compound graph encoder also uses a SMILES-represented compound as input and converts it into its molecular graph representation ( $[eqn]$ ) compatible with its inherent Graph Isomorphism Network (GIN) convolution layer that additionally incorporates edge attributes [22]. The initial features for the nodes (atoms) are the atomic number, chirality, degree, formal charge, number of hydrogen atoms, number of radical electrons, hybridization, aromaticity, and ring-like structure. The initial features of the edges (bonds) are the bond types, stereo configuration, and conjugation.

Atom-wise node embeddings of the given compound were built using the GIN convolution layer based on the initial atom-bond information and topological characteristics. They are subsequently aggregated through a global mean pooling layer into an h-dimensional single-handed molecular graph embedding which is mathematically expressed as $[eqn]$ and $[eqn]$ respectively.

The Compound graph encoder, which uses the SMILES representation of the drug and food compounds as input is mathematically expressed as follows:

[eqn]

[eqn]

where $[eqn]$ is the molecular graph of the input compound previously converted from its SMILES representation and contains the initial atom and bond features. GINEConv is the GIN convolution layer using edge attribute features, and MeanPool is a global mean pooling layer that aggregates the node embeddings $[eqn]$ to a single molecular graph embedding $[eqn]$ .

CYP450 encoder.

The CYP450 encoder takes a set of five CYP450 isoenzymes initially represented as amino acid sequences as inputs and converts them into a set of contextualized CYP450 isoenzyme embeddings. The input amino acid sequence representations for the isoenzymes were obtained from the UniProt database [23]. They were fed into ESM-2, a large-scale protein language transformer model trained on millions of protein sequences [24], which resulted in a set of five 480-dimensional language-based embeddings $[eqn]$ . Lastly, these language-based embeddings are propagated through an MLP, which results in a set of 5 h-dimensional refined protein embeddings $[eqn]$ . Note that the contextualized CYP450 isoenzyme embeddings are universally used for all drug-food compound pairs within the same batch during training.

The CYP450 encoder, which takes the five CYP450 isoenzymes $[eqn]$ , each represented as an input sequence of amino acids, is mathematically expressed as follows:

[eqn]

[eqn]

[eqn]

where $[eqn]$ are the language-based embeddings for the five CYP450 isoenyzmes encoded by the protein language transformer model, $[eqn]$ (esm2-t12-35M-UR50D). $[eqn]$ comprises a linear transformation layer ( $[eqn]$ ) followed by $[eqn]$ and a $[eqn]$ layer.

Compound-CYP interaction block.

The Compound-CYP interaction block (Fig 2b) calculates multi-head cross-attention scores between the contextualized CYP450 isoenzyme and chemical substructure embeddings, which are the input queries and key values, respectively. Its design intuition aligns with a ligand (food or drug compound) binding to the target CYP450 isoenzyme, exhibiting either inhibitory or substrate-related effects. By interpreting pairwise attention scores calculated by the cross-attention mechanism, we can imply which substructures play a crucial role in DCIs.

The cross-attention mechanism usually involves the calculation of pairwise attention weights between CYP450 isoenzymes (queries) and chemical substructure embeddings (keys) in an unsupervised manner. Attention weights are distributed to the keys for each query, regardless of whether the compound actually interacts with the isoenzyme. To address this issue, we adopted ArkDTA’s attention regularization method, which aims to modulate the distribution of attention weights driven by an auxiliary loss objective [17].

Specifically, a set of $[eqn]$ trainable h-dimensional pseudo-substructure embeddings are first appended to the current set of chemical substructure embeddings. Attention weights distributed to the pseudo-substructures imply absence of DCIs while the opposite applies for those distributed to the actual ones. We treated the sum of the weights assigned to both the pseudo-substructures and the actual substructures as binary class probability scores. They are fed to a binary cross-entropy loss function, where the ground truth labels are the compound-CYP450 interaction annotations from our DCI dataset.

Given that our study included two types of DCI substrates and inhibition, we applied supervised attention regularization to the attention weights from the first and second heads of the block, corresponding to substrate and inhibition effects, respectively. The remaining attention heads were unsupervised during the training process. The output of the Compound-CYP interaction block is an aggregation of 5 h-dimensional attention-based CYP450 isoenzyme embeddings $[eqn]$ using mean pooling. We denote this final output as compound-integrated CYP450 embedding $[eqn]$ . A detailed explanation of the loss objective is available in the Auxiliary Loss Objective section.

The Compound-CYP interaction block that takes the set of five contextualized CYP450 isoenzymes $[eqn]$ and n chemical substructure embeddings $[eqn]$ as inputs is mathematically expressed as follows:

[eqn]

[eqn]

[eqn]

[eqn]

[eqn]

[eqn]

where $[eqn]$ is a multihead attention block borrowed from the Set Transformer architecture [25] enhanced by set normalization layers and equivariant skip connections [26]. The attention block includes a multihead attention layer, where the input query, key, and value embeddings are $[eqn]$ , $[eqn]$ and $[eqn]$ respectively. It employs four attention heads with pairwise attention weights computed using additive attention. Note that the computed attention weights from the first two heads are fed to an auxiliary loss objective. $[eqn]$ is a row-wise feedforward layer consisting of two layers of linear transformation ( $[eqn]$ , $[eqn]$ ) each followed by Gaussian Error Linear Unit ( $[eqn]$ ) activation function.

Cross-modality fusion layer.

To effectively model the complex interactions between CYP450 isoenzymes, drug compounds, and food compounds, we implemented a Cross-modality fusion layer (Fig 2c) that combines attention-based drug-CYP450 integrative features $[eqn]$ with the food compound’s graph topological features $[eqn]$ and vice versa ( $[eqn]$ ). Recall that $[eqn]$ and $[eqn]$ represent molecular graph embeddings for the drug and food compound, respectively, whereas $[eqn]$ , $[eqn]$ denote the CYP450 embeddings integrated with the drug and food compound, respectively.

The fusion process employs a vector-wise outer product between the two heterogeneous embeddings for both sides of the drug-food pair. Each of the outer product results is reshaped into a single h^2^-dimensional embedding. Lastly, the two reshaped embeddings are concatenated to each other vector-wise, producing a final output of this layer: a 2h^2^-dimensional CYP450-mediated drug-food fusion embedding $[eqn]$ .

The Cross-modality fusion layer that takes the compound-integrated CYP450 isoenzyme embeddings $[eqn]$ and molecular graph embeddings $[eqn]$ as input is mathematically expressed as follows,

[eqn]

where $[eqn]$ , $[eqn]$ , $[eqn]$ refers to the reshaping process from $[eqn]$ matrix to $[eqn]$ vector, vector-wise outer product and concatenation.

Drug-food interaction prediction layer.

Finally, the Drug-food interaction prediction layer takes the single CYP450-mediated drug-food fusion embedding as input and ultimately predicts the interaction likelihood between the drug and food compound. The Drug-food interaction prediction layer is a deeply stacked feedforward layer that takes the single CYP450-mediated drug-food fusion embedding as input and ultimately predicts the interaction likelihood between the drug and food compound. This can be mathematically expressed as follows:

[eqn]

[eqn]

[eqn]

[eqn]

where $[eqn]$ and $[eqn]$ refer to the batch normalization and sigmoid functions, respectively. Weights and bias for the linear transformation layers are $[eqn]$ and $[eqn]$ respectively.

Model optimization

ArcDFI is trained under two loss objectives which are the primary loss objective for predicting DFIs and auxiliary loss objective for attention regularization. The former is implemented under supervised learning setting,

Primary loss objective.

The batch-wise primary loss objective for drug-food interaction prediction, treated as a binary classification task, is mathematically expressed as follows:

[eqn]

where $[eqn]$ and $[eqn]$ is a batch of predicted DFI scores and ground truth binary labels respectively, and b is the batch size.

Auxiliary loss objective.

To enhance ArcDFI’s understanding of CYP450-mediated drug-food interactions through semi-supervised learning, we introduced an auxiliary loss objective to regularize the attention mechanism. Let $[eqn]$ denote the attention weights computed between the jth CYP450 isoenzyme and kth compound substructure in the ith attention head within the Compound-CYP interaction block. The value of $[eqn]$ can be interpreted as the likelihood of the kth compound substructure contributing to the substrate (i = 1) or inhibitory effects (i = 2) to the jth CYP450 isoenzyme. To clarify, we applied attention regularization to the first ( $[eqn]$ ) and second attention heads ( $[eqn]$ ), whereas the remainder were left unsupervised as originally designed.

As described earlier, the Compound-CYP interaction block module appends a universal set of $[eqn]$ trainable pseudo-substructure embeddings to the current set of n key substructure embeddings resulting in a total of n + $[eqn]$ embeddings involved in pairwise attention weight computation with respect to the five query CYP450 isoenzyme embeddings. Given $[eqn]$ , if the partner compound forms a substrate binding interaction with the jth CYP450 isoenzyme according to the ground truth label in our DCI dataset, our designed auxiliary loss objective encourages the distribution of attention weights to concentrate on the n actual substructure embeddings $[eqn]$ , while reducing the weights for the $[eqn]$ pseudo-substructure embeddings, $[eqn]$ . Similarly, this modulation applies to $[eqn]$ for compounds exhibiting inhibitory effects on the jth CYP450 isoenzyme.

Conversely, when the ground truth label for a specific DCI type is negative, the loss objective promotes the opposite behavior by diverting attention away from the actual substructure embeddings. If the DCI label is unavailable, the loss objective is not applied, leaving the attention distribution unsupervised. Note that the calculated pairwise attention weights for each row in $[eqn]$ are normalized using a softmax function, making them inherently suitable for optimization using a cross-entropy loss in the auxiliary objective.

The batch-wise auxiliary loss objective for attention regularization based on compound-CYP450 interactions is mathematically expressed as follows:

[eqn]

where $[eqn]$ and $[eqn]$ is a batch of predicted interaction scores and ground truth binary labels respectively for each CYP450 isoenzyme and type. The interaction scores $[eqn]$ are calculated based on the computed attention weights extracted from the Compound-CYP interaction block.

Note that the sparse proportion of drug compounds with CYP450 isoenzyme interaction labels sets the auxiliary loss objective in a semi-supervised learning setting. When interaction labels are unavailable, the corresponding predicted interaction scores remain unsupervised. Only the drug compounds with available interaction labels are used to update the parameters of the Compound-CYP interaction block. Fig 3 provides an illustrated description for the auxiliary loss objective.

Detailed illustration for Attention Regularization Auxiliary Loss Objective.

Training ArcDFI.

Conclusively, the batch-wise total loss objective for training ArcDFI is mathematically expressed as follows,

[eqn]

where α is the auxiliary loss coefficient for controlling the intensity of attention regularization based on CYP450 interaction.

We trained ArcDFI for a maximum of 100 epochs using early stopping, optimized with the AdamW optimizer. The hyperparameters for ArcDFI’s model configuration include batch size, learning rate, weight decay, auxiliary loss coefficient, hidden embedding dimensionality, and the number of pseudo-embeddings. The batch size (b), learning rate, and weight decay were set to 1024, 0.0001, and 0.0001, respectively. Early stopping was configured with a patience of 10 epochs, using the validation loss as the monitoring metric. The auxiliary loss coefficient α was set to 5. All hyperparameters for the model architecture and training algorithm were consistently the same across all settings involving ArcDFI. For the model-related configurations of ArcDFI, the hidden dimensional size of embeddings and the number of pseudo-substructure embeddings were set to h = 128 and $[eqn]$ . We initially referenced the hyperparameter settings from the ArkDTA paper and then further refined them through a series of experiments, optimizing based on our model’s performance on the validation set.

Results

Evaluation on DFI prediction

To evaluate ArcDFI’s performance in DFI prediction, we first selected DFinder [5] and DFI-MS [6] as our main baselines. These models were originally designed to predict DFIs when given a pair of drug and food compounds as input. Since the DFI prediction task is an underexplored topic compared to DDI, we decided to include more baselines that are not necessarily tailored for DFI prediction, but compatible with the task itself, as this prediction task involves building learnable representations from input compounds. The following baseline models originally designed for DDI prediction were included in our DFI prediction experiments: DeepSynergy [27], DeepDDI [54], EPGCN-DS [29], CASTER [30], SSI-DDI [31], MatchMaker [32], MR-GNN [33], and DeepDrug [28].

Unlike previous studies on DFI, we employed a stricter evaluation approach that involved splitting the dataset based on its compound clusters. We first utilized Butina clustering to build drug and food compound clusters in the DFI dataset [34] and then made two data splits based on the food or drug compound clusters, which are referred to as cold drug and cold food settings. This data split approach has been widely used in research on drug-target interaction prediction [17,35,36]. The resultive number of drug-food compound pair-based data instances split to training, validation, and test partitions in each cold setting is the following,

**: **

We repeated the experiments three times using different random seeds for each model and data-split setting to ensure stability and robustness. We used five evaluation metrics to measure the predictive performance: Area Under ROC Curve (AUROC), Area Under Precision-Recall Curve (AUPRC), F1-score (F1), Precision, and Recall. The final scores for each model were averaged across three runs for each split setting.

During the DFI experiments, models, including baselines, were trained and evaluated on the same DFI dataset with different splits according to the evaluation setting (cold-drug, cold-food). Also, all baseline models imported from their respective GitHub repositories were trained with their default hyperparameters.

Tables 3 and 4 show the quantitative results of evaluating the new drug and food setting, respectively. ArcDFI outperformed its baseline models for the cold drug setting, especially in terms of AUROC, AUPRC, and F1, and even its ablated version. While cold food evaluation setting presented a harder challenge to the DFI models, ArcDFI showed second-best predictive performance. In contrast, the ablated version of ArcDFI, in which its auxiliary loss objective was removed, outperformed other models in the cold food setting. These results demonstrated that the proposed attention regularization method is advantageous for improving the generalizability of unseen drug compounds. In contrast, as our dataset does not contain any CYP information associated with food compounds, ArcDFI has limited generalizability in unseen foods because its Compound-CYP interaction block relies solely on traditional unsupervised cross-attention mechanisms.

Table 3: Evaluation results for ArcDFI, its ablated version, and baseline models under the cold drug experiment setting. All evaluation scores were averaged over three iterations along with their standard deviation. Best results are bold-faced.

Table 4: Evaluation results for ArcDFI, its ablated version and baseline models under the cold food experiment setting. All evaluation scores were averaged over three iterations along with their standard deviation. Best results are bold-faced.

Furthermore, our substructure-wise encoding method builds dense embeddings that generalize robustly to novel drug scaffolds. This is also demonstrated by SSI-DDI, a DDI prediction model that incorporates the concept of molecular substructures, which showed the second-highest AUROC, AUPRC, and F1 scores under the cold drug setting.

In contrast, the cold food experiments reveal DeepDrug outperforming SSI-DDI, which we attribute to the generally simpler scaffolds of many food compounds—characterized by fewer heavy atoms—where substructure-based features may lose representation power. Contrary to SSI-DDI and our ArcDFI, DeepDrug builds atom-level and bond-level embeddings derived from SMILES sequences. This allows it to better capture fine-grained atomic context and local chemical environments, which can be especially advantageous when modeling interactions involving smaller, structurally simple molecules. These observations highlight the strength of our shared compound encoding strategy while also suggesting that incorporating localized, atom-level features may further enhance generalizability to novel food-derived compounds.

Moreover, our model design approach, which features protein language modeling for CYP450 isoenzymes, the mixture of two compound modalities (molecular graph, substructures) and efficient parameter sharing between two compound inputs (Chemical Substructure Embedding Layer, Compound Graph Encoder, CYP Interaction Block), proved to be the optimal choice for this DFI prediction task. Whereas the other baseline models did not utilize CYP450 information or cross-modality fusion, our model architecture exhibited superiority in both cold drug and food evaluation settings.

Although DFinder and DFI-MS were expected to perform better than the DDI prediction models, they exhibited poor performance when evaluated in the new drug and food setting. The poor performances of both models are related to their inherent model structures. DFinder is a graph network-based model that relies on network topology for learning compound features represented as node embeddings. Similarly, DFI-MS depends on a set of embeddings that are only partially updated during training, leaving the model unable to generalize well to novel compounds, resulting in reduced performance in the evaluated settings. We remark that these two models require to be retrained on newly introduced compounds but ours can be directly utilized by running inference on such cases. Under this strict evaluation setting, our ArcDFI model was able to overall demonstrate robustness to the unfamiliar drug and food compounds, highlighting its potential to profile newly developed medications for food interaction risks.

Analysis on attention weights

To investigate the underlying relationships between CYP450 isoenzymes and drug-food compounds, we performed model inference and visualized the calculated attention weights extracted from ArcDFI’s Compound-CYP interaction block for each compound (drug, food) and head (substrate, inhibition) using heatmaps, as shown in Fig 4a, 4b, 4e, 4f and Fig 5a, 5b, 5e, 5f. Red cells indicate the pairwise interaction scores between the drug compound and each CYP450 isoenzyme under a specific interaction type (substrate or inhibition). Note that the pairwise scores were calculated based on the summation of attention weights distributed to each compound substructure, with respect to each CYP450 isoenzyme. Blue cells indicate the contribution of each chemical substructure of a compound to CYP450 isoenzyme interactions. The chemical substructures in the heatmap were represented using SMILES.

Analysis on ArcDFI’s Compound-CYP Interaction Block for drug-food compound pair Sulfonylurea and Salicylate.(a), (b): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Sulfonylurea, respectively. (c), (d): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Sulfonylurea, respectively. (e), (f): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Salicylate, respectively. (g), (h): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Salicylate, respectively.

Analysis on ArcDFI’s Compound-CYP Interaction Block for drug-food compound pair Midazolam and Berberine.(a), (b): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Midazolam, respectively. (c), (d): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Midazolam, respectively. (e), (f): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to substrate-type interactions for Berberine, respectively. (g), (h): Attention weights and highlighted compound substructures extracted from ArcDFI’s Compound-CYP Interaction Block head related to inhibition-type interactions for Berberine, respectively.

Since the substructure-wise attention weights assigned to each CYP isoenzyme may not fully provide model interpretability, we visualized each compound with its important substructural regions being colored with red, as shown in Fig 4c, 4d, 4g, 4h and Fig 5c, 5d, 5g, 5h. We first extracted the top three attended substructures from our heat maps and mapped each atom onto the molecule. Finally, we highlighted the designated atoms with red color, which represent the substructural regions focused on by ArcDFI’s Compound-CYP interaction block.

Fig 4 displays the attention weights and highlighted substructures extracted from the ArcDFI’s Compound-CYP interaction block when given a pair of unseen drug compound and seen food compound, Sulfonylurea and Salicylate, as input. Sulfonylurea is a drug-like compound that has been traditionally used for treating diabetes, while Salicylate is an organic acid that is known to exhibit analgesic, antipyretic, and anti-inflammatory effects [37,38].

According to Fig 4a, the weights imply that Sulfonylurea forms a DCI with CYP1A2 and CYP2C19 [39] (0.65, 0.66) as a metabolized substrate while its substructure N=S(=O)=O seemed to exhibit a relatively high influence on this interaction process. On the contrary in Fig 4b, Sulfonylurea exerted a relatively weaker inhibitory effect on CYP450 isoenzymes, with CYP2D6 being the most affected (0.37).

In Fig 4e, the weights indicate that Salicylate forms a DCI with CYP1A2 and CYP2C19 as substrates, which can potentially have stronger implications for a DFI between two compounds sharing the same isoenzymes as substrates (0.75, 0.76). In addition, the attention weights in Fig 4f indicate that Salicylate does not have inhibitory interactions with the CYP450 isoenzymes. Interestingly, the CYP2D6 isoenzyme received the highest score among the five isoenzymes (0.44), which was the same for Sulfonylurea.

The highlighted substructures shown in Fig 4c and 4d indicate that most of the substructural regions in Sulfonylurea are perceived to contribute to the overall attention weights, which aligns with the fact Sulfonylureas is used as a chemical derivative for making other drug compounds. As shown in Fig 4g and 4h, the aromatic substructures of Salicylate have received the most attention weights overall, which may implicate π-π interactions with the CYP450 isoenzyme targets.

Fig 5a, 5b, 5e, 5f shows the visualized attention weights when ArcDFI was given an out-of-dataset DFI pair (neither the drug nor the food compound was in our DFI dataset), with Midazolam and Berberine, as input. Midazolam is a sedative used for surgical purposes [40] and is widely used as a probe substrate for CYP3A4 isoenzyme [41], while Berberine is a plant-derived organic compound known to have anti-cancer effects [42].

As shown in Fig 5a and 5e, the attention weights indicated that both Midazolam and Berberine actively interacted with three CYP450 isoenzymes (1A2, 3A4, 2C19) as substrates. This aligns with our previous observation that ArcDFI recognizes a drug-food compound pair with an interaction when the two sides share the same CYP isoenzymes as metabolized substrates. Interestingly, no common molecular substructures exhibited similar DCI patterns, as shown in Fig 5c and 5g.

Interestingly, ArcDFI perceived Midazolam as a possible inhibitory agent targeting CYP1A2 and CYP2D6 (0.55, 0.50) as shown in Fig 5b. While this may require further investigation, we can expect ArcDFI to suggest novel DCIs and DFI predictions for unexplored drug-food pairs.

In conclusion, ArcDFI can highlight the parts of a drug or food compound most likely to interact with particular CYP450 isoenzymes. By investigating the attention weights extracted from its Compound-CYP interaction block, we can derive various DCI-related hypotheses to explain or refine the DFI predictive framework.

Discussion

Limitations

One of the main limitations of this study was the constructed DFI and DCI datasets. While the DFI dataset may seem to contain a large number of DFI pairs (609,180), it still suffers from sparsity issues as the total number of possible DFI pairs is 152,610,600, resulting in a sparsity rate of 0.4%. Despite our efforts to gather all available DCIs from multiple data sources, the number of drug compounds annotated with CYP450 isoenzyme interactions remain extremely sparse. Also, other interaction types (i.e., inducing effects) and CYP450 isoenzymes (i.e., CYP2E1) should be considered. As some studies state that Salicylate induces CYP2C19 [43], we expect our model’s Compound-CYP interaction block to align with this perspective, only if its attention regularization is augmented by sufficient annotated DCI data related to induction-type interactions.

The absence of available CYP interaction labels for food compounds restricts the potential of the attention regularization method, as shown in the evaluation results for the cold food setting experiments. We suspect that, without direct supervision from annotated food–CYP interactions, the regularized attention weights introduce misleading signals that impair the food compound representations. Also, the attention weights related to the interaction-type interaction of Berberine with CYP450 isoenzymes were inconsistent with experimental studies [44,45]. This is because the Compound-CYP interaction block relies only on the DCI information for modulating its inherent attention mechanism, which lacks guidance from Food-CYP450 interaction (FCI) information. Based on our experimental results where our auxiliary loss objective helped ArcDFI achieve state-of-the-art performance in the cold-drug setting, we expect the same beneficial effects when incorporating annotated FCI information into the same auxiliary loss objective when performing cold-food experiments.

Another limitation is related to the use of compound substructures originating from the ECFPs of drug and food compounds. Although this approach is computationally efficient, the structural information represented by these substructures is not canonical owing to the inherent hashing process in ECFPs. That is, two identical embeddings may represent different molecular substructures. Additionally, the resolution of ECFPs is constrained by a radius of 2, potentially overlooking larger yet meaningful substructures. Although the compound graph encoder somewhat mitigates these potential weaknesses, there is still room for improvement in the representation of the substructures for a given compound.

The main premise of our study was the assumption that DFIs are primarily related to CYP450-mediated drug metabolism. While this assumption, implemented as an auxiliary loss objective in ArcDFI, demonstrated strong empirical performance in our cold drug experiments, many other factors influencing DFIs should be considered. Non-CYP enzymes or pathways are involved in the oxidative metabolism of food compounds [46]. Non-enzymatic mechanisms such as pH-dependent drug solubility or chelation may have associations with DFIs as well [47,48]. Certain food products can influence microbiota, which may alter the metabolism of drug compounds and vice versa [49]. Furthermore, food can affect drug absorption by influencing the patient’s gastrointestinal physiology. This necessitates annotating not only DFI types but also their underlying mechanisms, such as CYP450 isoenzymes or other relevant factors.

Another set of limitations stems from the scope of our task formulation and the practicability of ArcDFI. First, our current approach treats DFI prediction as a binary classification task—predicting the presence or absence of a drug–food interaction. While this formulation provides a simplified and tractable objective, it may not fully capture the multi-faceted nature of DFIs. An alternative could involve quantifying interaction strength or modeling interaction pathways between compounds. Moreover, one of the data sources used for training, FDMine, labels structurally dissimilar compound pairs as negative (non-interacting) examples, which may introduce excessively simplistic supervision. Ideally, negative samples should be curated based on explicit literature evidence, or the task itself could be reformulated as a drug–food interaction scoring problem rather than binary classification.

In addition, DFIs can be modulated by individual physiological factors such as gastrointestinal conditions or microbiota composition—both of which are highly dependent on a person’s genetic background and lifestyle habits. This inter-individual variability poses challenges for model generalizability and may limit the model’s utility in personalized drug prescription settings, where the same food–drug combination could have divergent effects across different individuals.

Finally, we conducted an external validation experiment using 29 drug–food compound pairs curated from the DDID database [50]. While constructing a fully out-of-distribution dataset proved difficult due to the comprehensive coverage of our original dataset, we ensured that all compound pairs in the external set were novel, even if the individual drugs and food compounds had appeared separately in the training data. ArcDFI achieved an accuracy of 0.7241 and an AUROC of 0.5895, indicating limited robustness. However, due to the small number of samples in the external set, the AUROC may be unstable and not fully reflective of the model’s generalization ability. Nevertheless, these results still highlight the necessity of incorporating annotated food–CYP450 interaction (FCI) data to better guide the attention mechanism and improve the model’s generalization not only to novel compounds but also to new compound pairings.

Future work

We propose several directions for future improvements to ArcDFI. First, we plan to utilize automated text-mining approaches leveraging Large Language Models to collect more experimentally known DFIs and CYP450 interactions from the biomedical literature, and to improve the dataset quality through manual curation performed by domain experts. Furthermore, we plan to employ molecular docking simulations and binding affinity prediction tools to augment the unlabeled DCIs in our datasets. These additional data sources enhance the utility of attention regularization, thereby improving the training and predictive performance of ArcDFI.

Second, we plan to seek ways to improve the current design of the ArcDFI model architecture, specifically related to its molecular representation learning and attention regularization method. In particular, we plan to explore better alternatives for representing compound substructures using other fingerprint-based representations, such as pharmacophores, molecular fragmentization (BRICS Decomposition [51], RECAP Algorithm [52]), or SELFIES representation [53]. Moreover, we plan to implement task-specific modifications to the attention layers and devising robust optimization strategies to maximize the synergistic effects of attention regularization and DFI prediction.

Lastly, we plan to incorporate biological experiments to verify ArcDFI’s DFI predictions and DCI hypotheses derived from its attention mechanism, when given an out-of-dataset drug-food pair as input for its model inference. This step is critical for strengthening the reliability and applicability of the model in real-world scenarios and for bridging computational predictions with clinical practice.

Conclusion

We introduce a novel DFI prediction model ArcDFI, by incorporating attention regularization guided by compound-CYP450 interactions, which offers improved generalizability and interpretability in DFI prediction. To assess our model’s generalizability to unseen drug and food compounds, we conducted experiments under cold drug and food setting. The evaluation results on DFI prediction show ArcDFI’s strong performance against 10 baseline approaches.

Also, the attention weights extracted from our model’s Compound-CYP interaction block provided fresh insights into novel drug-food and compound-CYP450 relationships. We expect our model to facilitate discovery of novel drug-food interactions. Despite the challenges posed by dataset sparsity and the limited availability of CYP450 interaction labels, our model demonstrated promising results in terms of both predictive performance and interpretability.

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Telessy I. Let’s keep an eye on food-drug interaction. Food Nutr J. 2018;8:2575–7091.
2Neuvonen PJ. Interactions with the absorption of tetracyclines. Drugs. 1976;11(1):45–54. doi: 10.2165/00003495-197611010-00004 946598 · doi ↗ · pubmed ↗
3Lilja JJ, KivistöKT, Neuvonen PJ. Grapefruit juice-simvastatin interaction: effect on serum concentrations of simvastatin, simvastatin acid, and HMG-Co A reductase inhibitors. Clin Pharmacol Ther. 1998;64(5):477–83. doi: 10.1016/S 0009-9236(98)90130-8 9834039 · doi ↗ · pubmed ↗
4Hirsh J, Fuster V, Ansell J, Halperin JL, American Heart Association, American College of Cardiology Foundation. American Heart Association/American College of Cardiology Foundation guide to warfarin therapy. Circulation. 2003;107(12):1692–711. doi: 10.1161/01.CIR.0000063575.17904.4E 12668507 · doi ↗ · pubmed ↗
5Wang T, Yang J, Xiao Y, Wang J, Wang Y, Zeng X, et al. D Finder: a novel end-to-end graph embedding-based method to identify drug-food interactions. Bioinformatics. 2023;39(1):btac 837. doi: 10.1093/bioinformatics/btac 837 36579885 PMC 9828147 · doi ↗ · pubmed ↗
6Wei J, Li Z, Zhuo L, Fu X, Wang M, Li K, et al. Enhancing drug-food interaction prediction with precision representations through multilevel self-supervised learning. Comput Biol Med. 2024;171:108104. doi: 10.1016/j.compbiomed.2024.108104 38335821 · doi ↗ · pubmed ↗
7Guengerich FP. Cytochrome p 450 and chemical toxicology. Chem Res Toxicol. 2008;21(1):70–83. doi: 10.1021/tx 700079 z 18052394 · doi ↗ · pubmed ↗
8Zanger UM, Schwab M. Cytochrome P 450 enzymes in drug metabolism: regulation of gene expression, enzyme activities, and impact of genetic variation. Pharmacol Ther. 2013;138(1):103–41. doi: 10.1016/j.pharmthera.2012.12.007 23333322 · doi ↗ · pubmed ↗