CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction
Huong Van Le, Weibin Ren, Junhong Kim, Yukyung Yun, Young Bin Park, Young Jun Kim, Bok Kyung Han, Inho Choi, Jong IL Park, Hwi-Yeol Yun, Jae-Mun Choi

TL;DR
This study systematically evaluates various molecular representations and AutoML techniques to improve Caco-2 permeability prediction, identifying effective features and demonstrating the benefits of 3D descriptors and AutoML models.
Contribution
It introduces CaliciBoost, an AutoML-based model that outperforms existing methods in predicting Caco-2 permeability using optimized molecular features.
Findings
PaDEL, Mordred, and RDKit descriptors are highly effective.
Inclusion of 3D descriptors reduces MAE by 15.73%.
AutoML models achieve top performance in permeability prediction.
Abstract
Caco-2 permeability serves as a critical in vitro indicator for predicting the oral absorption of drug candidates during early-stage drug discovery. To enhance the accuracy and efficiency of computational predictions, we systematically investigated the impact of eight molecular feature representation types including 2D/3D descriptors, structural fingerprints, and deep learning-based embeddings combined with automated machine learning techniques to predict Caco-2 permeability. Using two datasets of differing scale and diversity (TDC benchmark and curated OCHEM data), we assessed model performance across representations and identified PaDEL, Mordred, and RDKit descriptors as particularly effective for Caco-2 prediction. Notably, the AutoML-based model CaliciBoost achieved the best MAE performance. Furthermore, for both PaDEL and Mordred representations, the incorporation of 3D descriptors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Pharmacogenetics and Drug Metabolism
MethodsMasked autoencoder · Feature Selection
