LengthLogD: A Length-Stratified Ensemble Framework for Enhanced Peptide Lipophilicity Prediction via Multi-Scale Feature Integration
Shuang Wu, Meijie Wang, Lun Yu

TL;DR
LengthLogD is a novel ensemble framework that improves peptide logD prediction accuracy by stratifying models based on peptide length and integrating multi-scale molecular features, aiding peptide drug development.
Contribution
The paper introduces a length-stratified ensemble approach with multi-scale feature integration and adaptive weighting, significantly enhancing peptide logD prediction, especially for long peptides.
Findings
Superior prediction performance across peptide lengths (R^2 up to 0.882)
Length-stratified strategy improves accuracy by 41.2%
Topological features contribute 28.5% to model importance
Abstract
Peptide compounds demonstrate considerable potential as therapeutic agents due to their high target affinity and low toxicity, yet their drug development is constrained by their low membrane permeability. Molecular weight and peptide length have significant effects on the logD of peptides, which in turn influences their ability to cross biological membranes. However, accurate prediction of peptide logD remains challenging due to the complex interplay between sequence, structure, and ionization states. This study introduces LengthLogD, a predictive framework that establishes specialized models through molecular length stratification while innovatively integrating multi-scale molecular representations. We constructed feature spaces across three hierarchical levels: atomic (10 molecular descriptors), structural (1024-bit Morgan fingerprints), and topological (3 graph-based features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
