Predicting Cell-Penetrating Peptide Uptake Mechanism from Sequence: A Machine Learning Approach
Abstract
Cell-penetrating peptides (CPPs) are promising drug delivery vectors, but their therapeutic efficacy depends critically on their cellular uptake mechanism. CPPs can enter cells via energy-dependent endocytosis or energy-independent direct translocation, with profound implications for cargo delivery and bioavailability. While numerous computational tools predict whether a peptide has cell-penetrating properties, none predict the uptake mechanism itself. Here, we present the first machine learning model specifically designed to predict CPP uptake mechanism from amino acid sequence. We curated a dataset of 142 CPPs with experimentally validated mechanisms from peer-reviewed literature. After removing sequences with >80% identity, 111 non-redundant peptides remained. Using nested 5-fold cross-validation with bootstrap confidence intervals, our best model (SVM-RBF) achieved an AUC-ROC of 0.795 [95% CI: 0.711-0.872], accuracy of 72.1%, and MCC of 0.447. Feature importance analysis revealed that hydrophobicity, leucine content, and basic residue ratio are key predictors, consistent with known biophysical mechanisms. Our model and dataset are freely available at https://github.com/Misterbra/cpp-mechanism-predictor.
Related articles
Related articles are currently not available for this article.