COLOR: A compositional linear operation-based representation of protein sequences for identification of monomer contributions to properties
Akash Pandey, Wei Chen, Sinan Keten

TL;DR
This paper introduces a novel interpretable deep learning model for protein sequences that accurately identifies key motifs influencing properties, surpassing existing explainability methods and aiding biomaterial design.
Contribution
The paper presents a new DL model with interpretable steps and a quantitative metric for analyzing monomer contributions in protein sequences, improving motif identification accuracy.
Findings
Model achieves 22% higher explainability.
Identifies motifs that destabilize anti-cancer peptides.
Finds motifs that enhance antimicrobial activity by 50%.
Abstract
The properties of biological materials like proteins and nucleic acids are largely determined by their primary sequence. While certain segments in the sequence strongly influence specific functions, identifying these segments, or so-called motifs, is challenging due to the complexity of sequential data. While deep learning (DL) models can accurately capture sequence-property relationships, the degree of nonlinearity in these models limits the assessment of monomer contributions to a property - a critical step in identifying key motifs. Recent advances in explainable AI (XAI) offer attention and gradient-based methods for estimating monomeric contributions. However, these methods are primarily applied to classification tasks, such as binding site identification, where they achieve limited accuracy (40-45%) and rely on qualitative evaluations. To address these limitations, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
