Multi-AOP: a lightweight multi-view deep learning framework for antioxidant peptide discovery
Jianxiu Cai, Xinpo Lou, Chak Fong Chong, Deepa Alex, Joel P. Arrais, Yapeng Wang, Shirley W. I. Siu

TL;DR
Multi-AOP is a new deep learning framework that improves the discovery of antioxidant peptides by combining sequence and molecular graph features.
Contribution
Introduces a lightweight multi-view deep learning framework for AOP discovery that outperforms existing methods.
Findings
Multi-AOP achieves high accuracy on benchmark datasets (0.8043, 0.9684, and 0.9043).
Combining sequence and graph features improves AOP prediction performance.
A unified AOP dataset was created to support future model development.
Abstract
Antioxidant peptides (AOPs), with their strong free radical scavenging ability and health benefits, have emerged as promising candidates for disease prevention and food preservation. However, traditional experimental approaches to AOP discovery remain hindered by inefficiencies and substantial resource demands. Here, we present Multi-AOP, a parameter lightweight multi-view deep learning framework (0.75 million parameters) that enhances AOP discovery through integrated sequence and graph learning. We employ Extended Long Short-Term Memory (xLSTM) to generate sequence embeddings. Concurrently, we transform peptide sequences into SMILES representations and extract molecular graph features using a Message Passing Neural Network (MPNN), capturing intrinsic physicochemical properties. By leveraging both sequence patterns and structural information through hierarchical fusion, Multi-AOP…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —Macao Polytechnic University
- —the Science and Technology Development Fund from Macao S.A.R.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Hydrolysis and Bioactive Peptides · Computational Drug Discovery Methods
Introduction
Reactive oxygen species (ROS) are generated during cellular metabolism and are normally neutralized by the antioxidant defense system (Apel and Hirt 2004). However, excess ROS compromise redox homeostasis, cause tissue damage, and contribute to chronic diseases (Lian-Jiu et al. 2019; Jomova et al. 2023). Oxidative damage also affects food quality (Choe and Min 2005), pharmaceutical stability (Saravanakumar et al. 2017), and cosmetic shelf-life (Othman et al. 2020)
While synthetic antioxidants are effective, concerns over their potential toxicity have prompted a shift towards natural alternatives (Pop et al. 2013; Khezerlou et al. 2022). Antioxidant peptides (AOPs) have emerged as promising candidates due to their free radical-scavenging activity and additional bioactive properties (Elias et al. 2008). However, traditional experimental approaches–including chemical assays, cell-based methods, and in vivo models (Irina Georgiana Munteanu and Constantin Apetrei 2021)–remain time-intensive and resource-demanding. This limitation has driven the development of computational methods.
Recent advances in deep learning have driven significant progress in bioactive peptide prediction. For antimicrobial peptides, architectures such as recurrent neural networks (Velickovic et al. 2017) and attention-based models (Yan et al. 2023; Cai et al. 2025; Li et al. 2025) have achieved strong performance. Transfer learning using pre-trained protein language models (PLMs), including ProtBERT (Elnaggar et al. 2021) and ESM (Lin et al. 2022), has demonstrated effectiveness across multiple peptide classification tasks. Graph neural networks (GNNs) have also been applied to capture molecular structural information (Gilmer et al. 2020). More recently, hybrid frameworks combining sequence and graph representations, such as GraphPep (Tao et al. 2025) and Multi-peptide (Badrinarayanan et al. 2024), have shown promise for peptide property prediction.
For AOP prediction specifically, Olsen et al. developed AnOxPePred using convolutional neural networks (CNNs) with one-hot encoding (Olsen et al. 2020). Qin et al. implemented a bidirectional LSTM with amino acid descriptors (Qin et al. 2023), and Li et al. developed AOPP integrating multiple sequence features (Li et al. 2025). Despite this progress, existing AOP predictors predominantly rely on sequence composition and handcrafted features, neglecting structural information that influences peptide bioactivity. This limitation motivates multi-view learning approaches that integrate complementary feature representations to improve predictive performance and generalizability.
In this work, we introduce Multi-AOP, a parameter lightweight multi-view framework that integrates sequence embeddings from an xLSTM with molecular-graph features from a Message Passing Neural Network (MPNN). Unlike existing hybrid approaches that typically employ large PLMs, our framework uses a parameter-efficient xLSTM and MPNN backbone ( 0.75M parameters) suitable for limited AOP training data, combined with a hierarchical fusion strategy that effectively integrates sequence and structural features. We also provide a unified AOP dataset combining three public benchmarks to facilitate future model development.
Materials and methods
Dataset
We collected three distinct existing datasets as the benchmark datasets: the AnOxPePred dataset from Olson et al., the AnOxPP dataset from Qin et al., and the AOPP dataset from Li et al. The AnOxPePred dataset, sourced from the BIOPEP-UWM database in 2020, comprises 676 free radical scavenger (FRS) AOP as positive samples and 728 non-AOPs as negative samples. The negative set includes 218 experimentally validated non-AOP supplemented with 500 randomly generated sequences. The AnOxPP dataset, compiled in 2023 from the DFBP and BIOPEP-UWM databases, contains 1060 peptides with documented radical scavenging activities as positive samples, balanced with 1060 randomly generated sequences serving as negative samples. The most recent AOPP dataset collected in 2025 represents the most comprehensive collection. AOPP integrated data from multiple repositories including DFBP, BIOPEP-UWM, Antimicrobial Peptide Database, PlantPepDB, and FermFooDb. This dataset encompasses 1511 validated AOPs and an equivalent number of randomly generated sequences as negative controls.BIOPEP-UWM contains information on peptide sequences, source proteins, and biological activities. DFBP provides functional annotations including activity types and source organisms. The Antimicrobial Peptide Database includes peptide properties such as length, net charge, and hydrophobicity. PlantPepDB focuses on plant-derived peptides with source organism information. FermFooDB contains peptides derived from fermented foods with associated functional properties. A summary of the three benchmark datasets is provided in Table 1.
Our analysis of the experimentally validated AOPs in these datasets revealed subtle differences in their distributional patterns. As shown in Fig. 1, the amino acid composition of AOPs in the AnOxPePred dataset exhibited a higher proportion of histidine and a lower proportion of glutamine compared to the other two datasets. In contrast, AOPs the AnOxPP and AOPP datasets demonstrated more similar compositional profiles. These variations underscore the importance of using multiple benchmark datasets to assess the generalizability of models across diverse peptide populations. While the randomly generated negative samples were directly obtained from the three benchmark datasets. This approach is common in bioactive peptide prediction due to the scarcity of experimentally validated non-antioxidant peptides. However, random sequences may be easier to distinguish from AOPs than biologically relevant peptides, potentially leading to optimistic performance estimates.Fig. 1. Distribution of AOPs as a function of sequence length and amino acid composition. A–B for AnOxPePred dataset, C–D for AnOxPP dataset, E–F for AOPP datasetTable 1Summary of AOP data setsDatasetYearSourcesAOPsNon-AOPsTotalAnOxPePred dataset2020BIOPEP-UWM6767281404AnOxPP dataset2023DFBPBIOPEP-UWM106010602120AOPP dataset2025DFBPBIOPEP-UWMAntimicrobial Peptide databasePlantPepDBFermFoodBD151115113022
Proposed Multi-AOP
Sequence feature extraction: xLSTM
Peptides are biomolecules composed of amino acids linked by peptide bonds. The sequential nature of these structures has prompted extensive research into sequence models for extracting latent features from peptide primary structures. Long Short-Term Memory (LSTM) networks have proven particularly advantageous in this domain due to their capacity to handle sequential inputs through a memory state mechanism that correlates distinct segments of the sequence (Graves and Graves 2012). Here, we employed xLSTM (Beck et al. 2024), a state-of-the-art special extended LSTM model, to generate robust amino acid sequence features.
The classical LSTM incorporates a memory cell regulated by three gates: the input gate, forget gate, and output gate. At each time step t, the forget gate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_t$$\end{document} determines the retention rate of previous information stored in memory cell \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{t-1}$$\end{document} , while the input gate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i_t$$\end{document} controls the integration of new information \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{c}_t$$\end{document} into memory. These operations are defined by the following equations:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f_t= & \sigma (W_f [h_{t-1}, x_t + b_f]), \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} i_t= & \sigma (W_i [h_{t-1}, x_t + b_i]), \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \widetilde{c}_t= & tanh(W_c \cdot [h_{t-1}, x_t] + b_c), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{t-1}$$\end{document} represents the output of the previous cell, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_t$$\end{document} denotes the current input, and W and b represent weight matrices and bias vectors, respectively. The sigmoid function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} and hyperbolic tangent function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tanh $$\end{document} are defined as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma (x)= & \frac{1}{1 + e^{-x}}, \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} tanh(x)= & \frac{1 - e^{-2x}}{1 + e^{-2x}}. \end{aligned}$$\end{document}The current memory cell state \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_t$$\end{document} is subsequently updated by integrating retained information from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_{t-1}$$\end{document} with new information \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{c}_t$$\end{document} , enabling selective memory retention or elimination according to:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} c_t= f_t \odot c_{t-1} + i_t \odot \widetilde{c}_t. \end{aligned}$$\end{document}Finally, the output gate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o_t$$\end{document} determines which components of the memory state contribute to the current hidden state \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_t$$\end{document} :
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} o_t= & \sigma (W_o [h_{t-1}, x_t + b_o]) , \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} h_t= & o_t \odot tanh(c_t), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_o$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_o$$\end{document} denote the corresponding weight matrix and bias vector.
The xLSTM architecture introduces several significant innovations that address the limitations of classical LSTM. First, it implements an improved gating mechanism for enhanced memory management through scalar LSTM (sLSTM). This mechanism replaces the standard sigmoid function in the input and forget gates with an exponential activation function. To mitigate potential numerical instability from large exponential values, the model stabilizes gates with an additional state \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_t$$\end{document} :
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} m_t = \max (log(f_t) + m_{t-1}, log(i_t)) . \end{aligned}$$\end{document}This exponential gating mechanism enables more dynamic information filtering, improving both information retention and forgetting processes across temporal dependencies. Second, xLSTM introduces matrix LSTM (mLSTM), which replaces traditional scalar memory cells \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \in \mathbb {R}$$\end{document} with matrix-based memory cells \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C \in \mathbb {R}^{d \times d}$$\end{document} . This matrix-based representation supports parallel computation and substantially enhances scalability. The incorporation of a covariance update rule in mLSTM further augments the model’s capacity to efficiently process larger datasets and longer sequences by capturing richer correlation patterns between sequence elements.
Furthermore, xLSTM stacks sLSTM and mLSTM blocks in a residual framework, enhancing model capacity while maintaining computational efficiency. While Transformers have become prevalent in many sequence modeling tasks, xLSTM demonstrates better parameter efficiency, requiring fewer trainable parameters than comparable Transformer models.
Graph feature extraction: MPNN
While sequence models like xLSTM excel at modeling the primary structure, graph models can encode information about secondary structural elements and interactions that influence antioxidant properties. MPNNs are a powerful class of graph neural networks (GNNs) designed to process graph-structured data, making them highly suitable for molecular modeling (Gilmer et al. 2020). In the context of our AOP task, AOPs are typically quite short, making it feasible to employ SMILES to represent the atomic and bonding structure of peptides. Then we applied RDKit, a widely used cheminformatics toolkit, to transform the SMILES representations of peptides into molecular graphs (Landrum 2013). In these graphs, atoms are represented as nodes, and the chemical bonds between them are modeled as edges. RDKit further facilitated the extraction of relevant node features and edge features. These features are subsequently fed into the MPNN.
The MPNN operates on graph-structured data by updating node representations through two main stages: the message passing phase and the readout phase. For an input graph data \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G=(V, E)$$\end{document} , where each vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v \in V$$\end{document} has an initial feature vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^0_v$$\end{document} , and each edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(v, u) \in E$$\end{document} may carry an edge feature \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_{vu}$$\end{document} . At each time step t, the network uses a message function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_t$$\end{document} and an update function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_t$$\end{document} to propagate and transform messages across the graph. The message function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_t$$\end{document} and update function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_t$$\end{document} are defined as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} M(h_v, h_w, e_{vw})= & (h_w, e_{ew}), \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} U_t(h^t_v, m^{t+1}_v)= & \sigma (H^{deg(v)}_t m^{t+1}_v), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\cdot , \cdot )$$\end{document} denotes concatenation, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} is the sigmoid function, deg(v) is the degree of vertex v, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^N_t$$\end{document} is a learned matrix for each time step t and the vertex degree N.
The message representation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^t_v$$\end{document} is given by
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} m^{t+1}_v = \sum _{u \in N(v)} M_t(h^t_v, h^t_u, e_{vu}), \end{aligned}$$\end{document}the message incorporates the node’s own state \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^t_v$$\end{document} , the neighbor’s state \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^t_u$$\end{document} , and the edge feature \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_{vu}$$\end{document} . Once the message \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^{t+1}_v$$\end{document} is computed, the node updates its state via the update function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_t$$\end{document} resulting in
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} h^{t+1}_v = U_t(h^t_v, m^{t+1}_v), \end{aligned}$$\end{document}The update function modifies the feature representation of each node based on the incoming messages. Through multiple iterations of this process, information can propagate across the molecular graph. After T iterations of message passing, a readout function R aggregates all final node embeddings into a comprehensive graph representation:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{y}_G = R(\{h^T_v \mid v \in V\}) \end{aligned}$$\end{document}The readout function typically employs permutation-invariant operations such as summation or maximization to ensure that the final representation is invariant to node ordering. The final representation encodes the global molecular properties that contribute to antioxidant activity. By leveraging MPNN for graph feature extraction, we enabled our model to capture the complex structural and electronic factors that determine the peptide antioxidant capacity.
Model design
A schematic representation of our proposed Multi-AOP framework is illustrated in Fig. 2. The architecture implements a dual-feature extraction approach, integrating both sequential and structural information to enhance AOP classification performance. For the sequence modeling component, we employed a char-int dictionary to tokenize and pad the sequence to ensure a uniform sequence length. Each input peptide sequence was encoded as a fixed-length integer vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X = [x_1, x_2,..., x_n]$$\end{document} , where each element \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x \in {0, 1,..., 19}$$\end{document} corresponds to a specific amino acid, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 50$$\end{document} represents the maximum sequence length. The sequence feature extraction network comprised three stacked xLSTM blocks followed by an average pooling mechanism, yielding a dense embedding vector of dimension of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 128$$\end{document} .
For structural feature extraction, we implemented a GNN approach to model the two-dimensional molecular topology. The peptide sequences were first converted to SMILES, which were subsequently transformed into molecular graph representations using RDKit (version 2025.09.01). The node and edge features provide a rich representation of the peptide chemical structure. The graph feature extraction network was composed of three message passing layers, a global readout layer, and an average pooling layer. The output extracted graph feature has a dimension of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 128$$\end{document} .
The integration of sequence and structural information was achieved through a hierarchical fusion of the respective feature vectors. The concatenated feature vector was then processed through two multilayer perceptrons (MLPs), each followed by a dropout regularization layer, facilitating information exchange between the complementary feature spaces. This multimodal integration enabled the model to leverage both the sequential patterns and structural motifs characteristic of antioxidant peptides.
To determine the optimal model configuration, we conducted comprehensive hyperparameter optimization through grid search. Our optimization strategy proceeded in two phases: first, we independently evaluated the performance of the xLSTM and graph model components, empirically establishing that three xLSTM blocks and three MPNNConv layers provided optimal feature extraction capabilities. Subsequently, we fine-tuned the critical hyperparameters of the fusion module, exploring various integration strategies and regularization parameters. The hyperparameter search space encompassed fusion methodologies, learning rate, weight decay rate, and dropout rate, with performance metrics summarized in Table 2. Model training was performed using the Adam optimizer with binary cross-entropy as the objective function. The final optimized architecture incorporated three xLSTM blocks and three MPNNConv layers, with a learning rate of 1e-5, a dropout rate of 2e-1, weight decay of 3e-3, and a batch size of 64.Table 2. Hyperparameters and candidate value ranges considered during model optimizationHyperparameterCandidate valueOptimal valueFusion strategiesconcatenation, cross attention, hierarchicalhierarchicalNumber of FC layers1, 2, 3, 43Weight decay rate1e-2, 3e-2,... 1e-3, 3e-3,...,7e-33e-3Dropout rate1e-1, 2e-1, 3e-1, ..., 9e-12e-1Learning rate1e-4, 1e-5, 1e-61e-5
Fig. 2. The architecture of Multi-AOP. Multi-AOP comprises two distinct modules. The sequence module employs xLSTM to aggregate the sequence feature from its sequence composition. The graph module employs MPNN to extract the sequence feature from its graph representation
Model performance metrics
We used four general performance metrics: Mathews correlation coefficient (MCC), accuracy, precision, and sensitivity. These metrics are calculated through four metrics: true positive (TP), true negative (TN), false positive (FP), and false negative (FN).
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} MCC= & \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) (TP +F N)(TN + FP)(TN + FN)}} \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Accuracy= & \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Precision= & \frac{TP}{TP + FP} \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Sensitivity= & \frac{TP}{TP + FN} \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} Specificity= & \frac{TN}{TN + FP} \end{aligned}$$\end{document}Results
Sequence model selection
To evaluate the efficacy of xLSTM for AOP prediction, we first compared its performance with Transformer-based protein language models (PLMs) such as ProtBERT (Elnaggar et al. 2021) and ESM2 (Lin et al. 2022). PLMs are known for their ability to capture long-range dependencies through self-attention mechanisms. PLMS have shown to be highly effective for tasks that require modeling complex structural and functional relationships in protein sequences. However, these models come with a significant drawback: their massive parameter sizes, often in the hundreds of millions. This large number of parameters makes PLMs prone to overfitting when applied to specialized tasks with limited data, such as AOP prediction.
In contrast, the xLSTM model offers a more parameter-efficient solution, containing only 0.70 million parameters–just 0.17% and 0.47% of the parameter sizes of ProtBERT and ESM2, respectively. With its substantially reduced parameters, the proposed xLSTM model achieved consistently competitive performance across all three benchmark datasets. Specifically, xLSTM attained classification accuracies of 0.7879, 0.9375, and 0.8804 on the AnOxPePred, AnOxPP, and AOPP datasets, respectively. This result demonstrates that xLSTM can provide an effective framework for modeling antioxidant peptide sequences without the risk of overfitting inherent in large-scale PLMs.
To ensure an accurate estimate of the performance of various models, we split each benchmark dataset into five partitions for cross-validation. The test was performed independently for each fold, using the remaining folds for training. The average performance of the test folds was reported. These comparisons are summarized in Table 3. Experimental results show that the xLSTM’s parameter-efficient design, advanced gating mechanisms, and specialized architecture make it a more effective and scalable solution for AOP prediction.
Moreover, we compared the xLSTM model to a vanilla multi-layer LSTM. The exponential gating mechanism and matrix-based memory cells in xLSTM offer significant improvements in feature extraction, even for short peptide sequences, where standard LSTMs are typically considered sufficient. As a result, xLSTM outperformed the vanilla LSTM across all three datasets, further underscoring its superior ability to predict AOPs.Table 3. Comparative results of the five-fold CV of ProtBERT, ESM2, and xLSTM sequence models across the three benchmark datasetsDatasetModelNumber ofparametersAccuracyPrecisionSensitivitySpecificityMCCAnOxPePreddatasetProtBERT420 M0.71980.76830.60000.86990.4493ESM2150 M0.78580.78550.76300.81510.5710LSTM0.63 M0.76210.77290.71200.80480.5205xLSTM0.70 M0.78790.79570.75260.80930.5755AnOxPPdatasetProtBERT420 M0.92920.94410.91320.94810.8592ESM2150 M0.93400.96460.90090.91280.8698LSTM0.63 M0.91870.92790.92970.90370.8238xLSTM0.70 M0.93750.94830.92570.95370.8754AOPPdatasetProtBERT420 M0.87690.92320.82240.93400.7585ESM2150 M0.87460.91880.82180.91400.7534LSTM0.63 M0.87100.90950.82890.90270.7336xLSTM0.70 M0.88040.91680.83660.93590.7637
Graph model selection
Graph Neural Networks (GNNs) have emerged as sophisticated tools for feature extraction from graph-structured data, particularly for molecular representations where nodes correspond to atoms and edges represent chemical bond interactions (Dauparas et al. 2022; Sumida et al. 2024). The relatively constrained length of AOPs enables effective SMILES transformation into molecular graphs without computational overhead, making graph-based approaches particularly suitable for this domain. Here, we examined five prominent GNN architectures: GraphSAGE (Hamilton et al. 2017), Graph Attention Networks (GATs) (Velickovic et al. 2017), Graph Isomorphism Networks (GINs) (Xu et al. 2018), MPNNs, and ChebNet (Tang et al. 2024). Each of these frameworks has previously demonstrated efficacy in modeling complex relationships between molecular structures and their biological activities through distinct computational mechanisms.
As mentioned, MPNNs operate through an iterative information exchange protocol wherein nodes update their representations by aggregating messages from neighboring nodes. This process enables the incorporation of both local chemical environment information and, through multiple iterations, global structural patterns. In contrast, GraphSAGE generates node embeddings by sampling and aggregating features from local neighborhoods, offering scalability advantages for large molecular graphs. GATs implement an attention-based mechanism that differentially weights neighboring nodes according to their contextual relevance. This adaptive focus allows GATs to prioritize particularly informative atomic interactions within the peptide structure, potentially highlighting functional groups critical for antioxidant activity. GINs, meanwhile, employ more mathematically expressive aggregation functions designed to enhance discriminative capabilities between isomorphically distinct graph structures–a property particularly valuable for precise molecular fingerprinting of peptides with similar compositions but different arrangements. ChebNet, proposed in 2024, utilizes spectral graph convolutions based on Chebyshev polynomials to capture multi-scale structural information across molecular topologies.
Table 4 presents a comprehensive performance analysis of these GNN architectures across our three benchmark datasets. Consistent with the experimental protocol described above, all models were evaluated using five-fold cross-validation on the three benchmark datasets. The MPNN architecture exhibited the highest overall performance among all evaluated graph models. The model achieved accuracies of 0.7473, 0.8703, and 0.8201 on the AnOxPePred, AnOxPP, and AOPP datasets, respectively. This performance can be attributed to MPNN’s effective balance between local chemical environment modeling and global structural pattern recognition capabilities in identifying the molecular motifs associated with antioxidant capacity in peptides.Table 4. Comparative results of the five-fold CV of MPNN, GAT, and GIN graph neural models across the three benchmark datasets.DatasetModelNumber ofparametersAccuracyPrecisionSensitivitySpecificityMCCAnOxPePreddatasetGraphSAGE23 k0.73950.73720.70370.77390.4440GAT54 k0.73310.70550.76300.63000.4684GIN19 k0.70820.73870.60740.72420.4178ChebNet32 k0.73570.73440.72740.77640.4674MPNN75 k0.74730.74620.71850.78710.4935AnOxPPdatasetGraphSAGE23 k0.83920.86340.87180.83470.7388GAT54 k0.81600.75570.93400.80920.6504GIN19 k0.66510.62320.83490.80070.3511ChebNet32 k0.81990.83100.85000.82990.7355MPNN75 k0.87030.84890.90090.83960.7420AOPPdatasetGraphSAGE23 k0.81780.82860.80330.83230.6368GAT54 k0.75250.69970.88450.84080.5235GIN19 k0.64520.62150.74260.72670.2961ChebNet32 k0.81310.83340.76760.84070.6330MPNN75 k0.82010.84400.78550.85480.6418
Comparative analysis of baseline prediction models
To establish benchmark performance, we conducted experiments using different machine learning (ML) and deep learning (DL) algorithms, in combination with various peptide feature encoding methods. For conventional ML algorithms, we selected five widely-used classifiers that represent diverse learning paradigms: Decision Tree (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). DT was included as a simple, interpretable baseline. KNN represents instance-based learning that classifies samples based on similarity in feature space. SVM was selected for its effectiveness in high-dimensional spaces and its proven performance in peptide classification tasks. RF and XGBoost represent ensemble methods that combine multiple weak learners, with XGBoost being particularly effective for structured data. These algorithms were evaluated with twelve peptide descriptors commonly used in peptide property prediction, including Amino Acid Composition (AAC), Dipeptide Composition (DPC), Composition-Transition-Distribution (CTD), and Dipeptide Deviation from Expected Mean (DDE). A complete list of descriptors is provided in Supplementary Table S1.
In addition to conventional ML approaches, we included AnOxPePred, AOPP, and PepBERT, which are publicly available methods specifically developed for peptide property prediction. AnOxPePred employs one-hot encoding with convolutional neural networks (CNNs), representing a standard sequence-based approach. AOPP utilizes bidirectional LSTM (Bi-LSTM) with fused amino acid descriptors, representing a more advanced architecture that captures bidirectional sequential dependencies. PepBERT (Zhenjiao et al. 2025) is a BERT-based model pre-trained on peptide sequences from UniRef. Following its original implementation, we utilized PepBERT as a feature extractor to generate sequence embeddings and trained a logistic regression classifier for AOP prediction. It should be noted that the AnOxPP model was excluded from our comparative analysis due to the unavailability of its source code for replication.
Table 5 presents the comprehensive performance metrics of various ML and DL model-descriptor combinations across the three benchmark datasets. Interestingly, the combination of the SVM classifier with DDE descriptors demonstrated efficacy, surpassing the specialized DL models on the AnOxPePred and AnOxPP dataset. The SVM model achieved an accuracy of 0.7379 on the AnOxPePred dataset and an accuracy of 0.9528 on the AnOxPP dataset. The p-values from paired t-tests comparing the accuracy of the SVM models and our proposed Multi-AOP model are both less than 0.001 on the AnOxPePred, AnOxPP datasets, indicating that the improvement of Multi-AOP is statistically significant.
Among the evaluated specialized DL frameworks, AOPP consistently outperformed AnOxPePred across all three benchmark datasets. This performance gain can be attributed to its capability of Bi-LSTM architectures to capture bidirectional sequential dependencies in peptide sequences compared to the conventional CNNs. Compared to the next best DL method AOPP, our proposed Multi-AOP framework achieved further improvements in classification accuracies, reaching 0.8043 ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} 9.00%), 0.9684 ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} 1.64%), and 0.9043 ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} 0.14%) on the AnOxPePred, AnOxPP, and AOPP datasets, respectively. The improvements in accuracy are statistically significant, as indicated by p-values below the 0.05 significance level (6.453e-5, 2.442e-03, and 0.0486, respectively). This further performance gain stemmed from the multi-view learning strategy, which enables the model to capture complementary features from both primary sequence and molecular graph representations, resulting in more robust and generalizable predictions of peptide antioxidant activity.Table 5. Comparative results of the five-fold CV of Multi-AOP with various machine learning algorithms and existing deep learning prediction models across the three benchmark datasetsDatasetModelFeatureNumber ofparametersAccuracyPrecisionSensitivitySpecificityMCCAnOxPePreddatasetDTAAC-0.67260.65030.68890.65750.3462KNNDDE-0.72600.71010.69090.72600.4517RFAAC-0.69400.66060.59500.76880.3697SVMDDE-0.73790.74480.69170.78140.4745XGBoostAAC-0.73310.71430.67410.75340.4654AnOxPePredone-hot encoding38 k0.64410.67880.69080.67030.3506AOPPADCA^a^0.18 M0.73150.71600.73610.72690.4652PepBERTembedding layer4.94 M0.75800.76800.71110.80140.5152Multi-AOPembedding layer0.75 M0.80430.82260.75560.84930.6086AnOxPPdatasetDTAAC-0.90330.90910.89620.90400.8067KNNAAC-0.90330.92960.87260.93400.8081RFAAC-0.93160.95520.90570.97170.8644SVMDDE-0.95280.97980.91510.98060.9083XGBoostAAC-0.86790.86110.87740.85850.7360AnOxPePredone-hot encoding38 k0.93160.94200.91980.94340.8634AOPPADCA0.18 M0.94100.95790.92370.95920.8837PepBERTembedding layer4.94 M0.92450.93690.91040.93870.8494Multi-AOPembedding layer0.75 M0.96840.98350.95280.98390.9373AOPPdatasetDTAAC-0.80200.702080200.81850.6040KNNAAC-0.84650.88600.79540.89770.6967RFAAC-0.86290.90650.83170.90760.7484SVMCTDC-0.88780.96080.80860.96700.7855XGBoostAAC-0.83830.85710.81190.87130.6775AnOxPePredone-hot encoding38 k0.87620.94030.79870.95380.7617AOPPADCA0.18 M0.90300.94220.86070.94560.8103PepBERTembedding layer4.94 M0.87290.91240.82510.92080.7493Multi-AOPembedding layer0.75 M0.90430.96460.83960.96900.8156^a^ADCA: feature concatenation of Amino Acid Composition (AAC), Dipeptide Composition (DPC), Compositionof K-Spaced Amino Acid Group Pairs (CKS), and Amino Acid Index (AAI).
Ablation study
Our Multi-AOP framework complemented primary sequence features extracted from the xLSTM sequence model with two-dimensional structural features from the MPNN graph model. To evaluate the contribution of each component, we conducted an ablation study. As shown in Fig. 3, we compared the performance of individual sub-models against our integrated approach across multiple metrics on all benchmark datasets. The sequence model consistently outperformed the graph model. However, our hybrid model achieved substantial performance improvements over both individual sub-models, demonstrating that the structural information encoded in two-dimensional molecular graphs provides complementary insights that enhance the predictive capability of sequence-based embeddings alone.
To further investigate the relative contribution of sequence features and structural features across different types of peptides, we categorized peptide sequences into short peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<15$$\end{document} amino acids) and long peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 15$$\end{document} amino acids). This threshold represents the minimum length needed for stable secondary structure formation, as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} -helices typically require 10–15 residues to stabilize. We hypothesize that because short peptides remain structurally flexible, their antioxidant activity is primarily determined by their amino acid composition. In contrast, longer peptides can form folded structures in which the spatial positioning of reactive groups becomes critical for their radical scavenging function. The test was performed on the merged dataset of the three benchmark data, yielding a dataset of 5,235 peptides, comprising 2,597 AOPs and 2,638 random sequences.
As shown in Table 6, the xLSTM-based sequence model achieved a higher accuracy of 0.8607 on short peptides, while the MPNN-based graph model performed better on long peptides with an accuracy of 0.8571. This indicates that sequence features dominate in shorter peptides, whereas structural features become more important as peptide length increases. Of note, our Multi-AOP framework showed reduced performance on long peptides. This performance gap may be attributed to the limitations of SMILES-derived molecular graphs for longer peptides, where the increased graph complexity introduces noise that can obscure relevant structural patterns.Table 6. Ablation study of Multi-AOP by peptide length on the merged dataset.Sequence lengthModelAccuracyPrecisionSensitivitySpecificityMCCShort peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< 15$$\end{document} aa)xLSTM0.86070.85350.85660.84490.7215MPNN0.79010.78400.79670.78360.5803Multi-AOP0.87780.85690.88280.85280.7360Long peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>15$$\end{document} aa)xLSTM0.73210.73330.75860.70370.4632MPNN0.85710.85290.90630.79170.7072Multi-AOP0.82140.80650.86210.77780.6431
Fig. 3. Ablation study. Performance comparison of Multi-AOP and its sub-models in predicting peptide antioxidant activity
External validation on newly curated data
To enhance the practical utility of our proposed model, we developed the final prediction model trained on the whole merged dataset containing 5,235 sequences. Furthermore, we evaluated the performance of models trained on individual datasets versus the merged one, while the 54 newly published AOPs were used as the external validation set. These 54 newly published AOPs collected from diverse sources including DEBP, BIOPEP-UWM database, Antimicrobial Peptide database, Plant-PepDB, and FermFooDB. The distributions of the length and amino acid composition are shown in Fig. 4 A and B. Note that these external sequences demonstrated minimal sequence homology with our training data, exhibiting an average sequence identity below 9%, as illustrated in Fig. 4 C. As presented in Table 7, our final model trained on the merged dataset achieved an impressive sensitivity of 0.9057 on this challenging external dataset, highlighting the benefits of data integration. While AnOxPePred and AOPP obtained sensitivities of 0.5849 and 0.7380, these performance comparison closely aligned with the results obtained on the test sets of the three benchmark datasets, providing evidence of the generalizability of our final AOP prediction model.Table 7. External validation performance of Multi-AOP trained on individual and merged datasetsValidation setTraining dataNumber of sequencesSensitivity54 newly published AOPsAnOxPePred dataset14040.5340AnOxPP dataset21200.8802AOPP dataset30220.8697Merged dataset52350.9057
Fig. 4. Performance evaluation of Multi-AOP on the newly discovered AOPs: A Distribution of sequence length, B Distribution of sequence amino acid composition, C Distribution of sequence identity between the combined dataset and the external dataset
Discussion and conclusion
In this work, we proposed Multi-AOP, a novel framework that integrates multi-view information to predict peptide antioxidant activity, achieving state-of-the-art accuracy on three benchmark datasets. Our approach leverages the strengths of both sequence-based and structure-based representations, demonstrating that hierarchical fusion of diverse molecular features can enhance predictive performance. Compared with pre-trained PLMs containing hundreds of millions of parameters, we designed a parameter-efficient xLSTM architecture to extract sequence features from amino acid sequences. Additionally, the intrinsically short length of AOPs makes it feasible to convert peptide sequences into SMILES representations, which enable the construction of molecular graphs for training GNNs to learn two-dimensional structural features. The sequence and graph features were then hierarchically fused for AOP prediction.
Our Multi-AOP framework shares conceptual similarities with several recently developed hybrid peptide prediction methods. Multi-Peptide combines PeptideBERT with Graph Neural Networks using a CLIP-inspired contrastive loss framework, achieving an accuracy of 88.057% (Badrinarayanan et al. 2024). However, this approach relies on the computationally expensive PeptideBERT backbone (420 million parameters). In contrast, Multi-AOP (0.75 million parameters) employs a lightweight xLSTM-based sequence encoder combined with SMILES-derived 2D molecular graphs.
Comparative evaluation with ML and DL prediction models demonstrated that our proposed model outperformed both traditional ML models and existing DL architectures. Specifically, Multi-AOP achieved accuracies of 0.8043, 0.9684, and 0.9043 on the AnOxPePred, AnOxPP, and AOPP datasets, respectively. To facilitate broader adoption, we trained a final prediction model on the merged dataset combining all three benchmarks. External validation confirmed the generalizability of this model on newly published AOP sequences. Beyond its predictive performance, Multi-AOP offers considerable practical value for in silico screening of large-scale peptide libraries, accelerating the identification of antioxidant peptides for drug discovery and nutraceutical development.
While our current framework successfully integrates sequence and two-dimensional structural information, several limitations warrant consideration. First, the model does not explicitly handle post-translational modifications (PTMs), as the training data primarily consist of unmodified peptide sequences. Addressing this limitation would require datasets containing modified residues and suitable molecular representations. Second, our SMILES-to-graph conversion via RDKit may fail for non-standard amino acids or unusual peptide structures, potentially limiting its applicability to unconventional sequences.
Furthermore, our ablation study revealed that Multi-AOP performs better on short peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<15$$\end{document} amino acids) than on longer peptides ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 15$$\end{document} amino acids). This performance gap suggests that two-dimensional graph representations are insufficient to capture the conformational complexity of longer peptides, where antioxidant activity may depend on three-dimensional spatial arrangements of functional groups. Recent advances in AI-based protein structure prediction, particularly AlphaFold (Bryant et al. 2022) and ESMFold (Lin et al. 2022), have made high-quality three-dimensional structural information readily accessible. In future work, we plan to integrate predicted 3D coordinates as additional node features in the graph representation to capture spatial relationships between atoms.
Finally, we aim to incorporate explainability techniques such as SHapley Additive exPlanations (SHAP) (Lundberg and Lee 2017) or attention visualization to identify key amino acids and structural motifs contributing to antioxidant activity. Such interpretable insights could guide rational peptide design by revealing sequence-activity relationships and highlighting functional residues critical for radical scavenging. Looking forward, Multi-AOP provides a foundation for sustainable peptide discovery by enabling rapid, resource-efficient screening of candidate AOPs. As experimental AOP databases continue to expand, retraining Multi-AOP on larger and more diverse datasets will further improve its predictive accuracy and generalizability. We envision Multi-AOP contributing to the development of next-generation natural antioxidants for food preservation, pharmaceutical formulation, and therapeutic applications.
Supplementary Information
Supplementary material 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Beck M, Pöppel K, Spanring M, Auer A, Prudnikova O, Kopp M, Klambauer G, Brandstetter J, Hochreiter S (2024) xlstm: Extended long short-term memory. ar Xiv preprint ar Xiv:2405.04517
- 2Choe E, Min DB (2005) Chemistry and reactions of reactive oxygen species in foods. Journal of food science, 70(9):R 142–R 15910.1080/1040839050045547416403681 · doi ↗ · pubmed ↗
- 3Irina Georgiana Munteanu and Constantin Apetrei (2021) Analytical methods used in determining antioxidant activity: A review. Int J Mol Sci 22(7):338010.3390/ijms 22073380 PMC 803723633806141 · doi ↗ · pubmed ↗
- 4Khezerlou A, pouya Akhlaghi A, Alizadeh AM, Dehghan P, Maleki P, (2022) Alarming impact of the excessive use of tert-butylhydroquinone in food products: A narrative review. Toxicology reports 9:1066–107510.1016/j.toxrep.2022.04.027PMC 976419336561954 · doi ↗ · pubmed ↗
- 5Li W, Liu X, Liu Y, Zheng Z (2025) High-accuracy identification and structure–activity analysis of antioxidant peptides via deep learning and quantum chemistry. Journal of Chemical Information and Modeling,10.1021/acs.jcim.4c 0171339772654 · doi ↗ · pubmed ↗
- 6Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. ar Xiv preprint ar Xiv:1710.10903,
- 7Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? ar Xiv preprint ar Xiv:1810.00826
- 8Yan K, Lv H, Guo Y, Peng W, Liu B (2023) samppred-gat: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics, 39(1):btac 71510.1093/bioinformatics/btac 715PMC 980555736342186 · doi ↗ · pubmed ↗
