TY - JOUR
T1 - AI-driven discovery of novel extracellular matrix biomarkers in pelvic organ prolapse
AU - Mi, Yanlin
AU - Cahill, Ben
AU - Yallapragada, Venkata V.B.
AU - Rotem, Reut
AU - O'Reilly, Barry A.
AU - Tabirca, Sabin
N1 - Publisher Copyright:
Copyright: © 2025 Mi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2025/10/1
Y1 - 2025/10/1
N2 - Deep learning for protein function prediction faces significant challenges in identifying disease-specific proteins. We present Extracellular Matrix Protein Predictor (EPOP), an advanced transfer learning framework leveraging protein language models to decode disease mechanisms. Focusing on pelvic organ prolapse (POP), which affects up to 50% of women worldwide, EPOP demonstrates AI's power to reveal novel therapeutic targets. We developed a sophisticated fine-tuning protocol for the ESM-2 model, optimized for ECM protein prediction. Our architecture integrates specialized attention mechanisms with interpretability modules, trained on expertly curated and balanced datasets totaling 80,000 proteins (40,000 ECM and 40,000 non-ECM). The framework employs a novel validation strategy using a 16,000-sample independent test set and clinical proteomics data. EPOP achieved unprecedented performance (99.40% accuracy) in ECM protein classification, significantly surpassing traditional deep learning architectures (10.81% improvement over Transformer models, 21.71% over Long Short-Term Memory). Applied to clinical samples, our model revealed a previously unknown pattern of ECM remodeling, identifying 24 novel disease-associated proteins. Model interpretability analysis uncovered specific sequence motifs and structural features critical for ECM protein function, providing mechanistic insights into disease progression. EPOP demonstrates how advanced AI bridges molecular analysis and clinical applications, uncovering novel therapeutic targets. Its success suggests broader applications across ECM-related disorders, potentially transforming approaches to diseases affecting connective tissue architecture.
AB - Deep learning for protein function prediction faces significant challenges in identifying disease-specific proteins. We present Extracellular Matrix Protein Predictor (EPOP), an advanced transfer learning framework leveraging protein language models to decode disease mechanisms. Focusing on pelvic organ prolapse (POP), which affects up to 50% of women worldwide, EPOP demonstrates AI's power to reveal novel therapeutic targets. We developed a sophisticated fine-tuning protocol for the ESM-2 model, optimized for ECM protein prediction. Our architecture integrates specialized attention mechanisms with interpretability modules, trained on expertly curated and balanced datasets totaling 80,000 proteins (40,000 ECM and 40,000 non-ECM). The framework employs a novel validation strategy using a 16,000-sample independent test set and clinical proteomics data. EPOP achieved unprecedented performance (99.40% accuracy) in ECM protein classification, significantly surpassing traditional deep learning architectures (10.81% improvement over Transformer models, 21.71% over Long Short-Term Memory). Applied to clinical samples, our model revealed a previously unknown pattern of ECM remodeling, identifying 24 novel disease-associated proteins. Model interpretability analysis uncovered specific sequence motifs and structural features critical for ECM protein function, providing mechanistic insights into disease progression. EPOP demonstrates how advanced AI bridges molecular analysis and clinical applications, uncovering novel therapeutic targets. Its success suggests broader applications across ECM-related disorders, potentially transforming approaches to diseases affecting connective tissue architecture.
UR - https://www.scopus.com/pages/publications/105017946090
U2 - 10.1371/journal.pcbi.1013483
DO - 10.1371/journal.pcbi.1013483
M3 - Article
C2 - 41056361
AN - SCOPUS:105017946090
SN - 1553-734X
VL - 21
SP - e1013483
JO - PLOS Computational Biology
JF - PLOS Computational Biology
IS - 10
ER -