From Clusters to Outcomes: Machine Learning-Based Phenotyping in Intermediate–High-Risk Acute Pulmonary Embolism

18 January 2026

Barkin KultursayCihangir KaymazHacer Ceren TokgozMurat KaracamBerhan KeskinSeda TanyeriAykun HakgorDeniz MutluCagdas BulusDicle SirmaSeyma Zeynep AticiMetehan KibarSeyma Nur CicekAziz VezirCan ErdemZubeyde BayramSeyhmus KulahciogluAhmet SekbanIbrahim Halil TanbogaNihal Ozdemir

https://doi.org/10.1002/pul2.70243 

 

Abstract

Intermediate–high-risk (IHR) pulmonary embolism (PE) represents a heterogeneous group in whom guideline-based criteria may insufficiently capture biologic and hemodynamic variability relevant to early deterioration. Data-driven phenotyping may improve risk stratification and support individualized decisions regarding reperfusion therapy. In this retrospective cohort study (2012–2025), 553 guideline-defined IHR PE patients were analyzed using unsupervised machine learning. Thirty-six demographic, clinical, laboratory, echocardiographic, and CT variables were standardized and encoded as appropriate for clustering. Multiple algorithms were compared, and the optimal model was selected using silhouette width and stability metrics. Clinical characteristics, imaging findings, treatment patterns, and outcomes were compared across phenotypes. The primary outcome was in-hospital mortality; secondary outcome was all-cause long-term mortality. Multivariable logistic regression and Cox models assessed associations with outcomes, and pre–post-treatment changes were evaluated. Two phenotypes were identified using the k-prototypes algorithm (silhouette width = 0.697). Cluster 1 (RV-failure phenotype; n = 360) exhibited younger age, lower systolic blood pressure, more severe RV dysfunction, higher thrombotic burden, and lower baseline TAPSE/PASP ratios. Cluster 2 (comorbidity-dominant phenotype; n = 193) comprised older patients with more cardiovascular/metabolic comorbidities but relatively preserved hemodynamics. In-hospital mortality was 6.0% overall and lower in Cluster 2 (3.6% vs. 7.2%); Cluster 2 remained independently associated with reduced early mortality (OR: 0.43; 95% CI: 0.19–0.98). The CDT–cluster interaction term was not statistically significant. Both phenotypes demonstrated significant improvements in RV function after reperfusion, with greater gains—including TAPSE/PASP—in Cluster 1. Over a median follow-up of 73.2 months, long-term mortality did not differ significantly between phenotypes (log-rank p = 0.11). Unsupervised ML revealed two clinically meaningful IHR PE phenotypes with divergent early risk but comparable long-term outcomes. These findings suggest that phenotype-based assessment may refine risk stratification and help guide individualized decisions regarding CDT and other reperfusion strategies in acute PE.

Read the full research article

Share: