From Clusters to Outcomes: Machine Learning-Based Phenotyping in Intermediate–High-Risk Acute Pulmonary Embolism

18 January 2026

Barkin Kultursay, Cihangir Kaymaz, Hacer Ceren Tokgoz, Murat Karacam, Berhan Keskin, Seda Tanyeri, Aykun Hakgor, Deniz Mutlu, Cagdas Bulus, Dicle Sirma, Seyma Zeynep Atici, Metehan Kibar, Seyma Nur Cicek, Aziz Vezir, Can Erdem, Zubeyde Bayram, Seyhmus Kulahcioglu, Ahmet Sekban, Ibrahim Halil Tanboga, Nihal Ozdemir

https://doi.org/10.1002/pul2.70243

Abstract

Intermediate–high-risk (IHR) pulmonary embolism (PE) represents a heterogeneous group in whom guideline-based criteria may insufficiently capture biologic and hemodynamic variability relevant to early deterioration. Data-driven phenotyping may improve risk stratification and support individualized decisions regarding reperfusion therapy. In this retrospective cohort study (2012–2025), 553 guideline-defined IHR PE patients were analyzed using unsupervised machine learning. Thirty-six demographic, clinical, laboratory, echocardiographic, and CT variables were standardized and encoded as appropriate for clustering. Multiple algorithms were compared, and the optimal model was selected using silhouette width and stability metrics. Clinical characteristics, imaging findings, treatment patterns, and outcomes were compared across phenotypes. The primary outcome was in-hospital mortality; secondary outcome was all-cause long-term mortality. Multivariable logistic regression and Cox models assessed associations with outcomes, and pre–post-treatment changes were evaluated. Two phenotypes were identified using the k-prototypes algorithm (silhouette width = 0.697). Cluster 1 (RV-failure phenotype; n = 360) exhibited younger age, lower systolic blood pressure, more severe RV dysfunction, higher thrombotic burden, and lower baseline TAPSE/PASP ratios. Cluster 2 (comorbidity-dominant phenotype; n = 193) comprised older patients with more cardiovascular/metabolic comorbidities but relatively preserved hemodynamics. In-hospital mortality was 6.0% overall and lower in Cluster 2 (3.6% vs. 7.2%); Cluster 2 remained independently associated with reduced early mortality (OR: 0.43; 95% CI: 0.19–0.98). The CDT–cluster interaction term was not statistically significant. Both phenotypes demonstrated significant improvements in RV function after reperfusion, with greater gains—including TAPSE/PASP—in Cluster 1. Over a median follow-up of 73.2 months, long-term mortality did not differ significantly between phenotypes (log-rank p = 0.11). Unsupervised ML revealed two clinically meaningful IHR PE phenotypes with divergent early risk but comparable long-term outcomes. These findings suggest that phenotype-based assessment may refine risk stratification and help guide individualized decisions regarding CDT and other reperfusion strategies in acute PE.

Read the full research article

From Clusters to Outcomes: Machine Learning-Based Phenotyping in Intermediate–High-Risk Acute Pulmonary Embolism

Abstract

Prospective Evaluation of Serial Biomarkers in Patients With Intermediate High Risk Acute Pulmonary Embolism: A Single Center Proof-of-Concept Study

Chronic Obstructive Pulmonary Disease and Pulmonary Hypertension: A Comparative Study of Biomarkers and Clinical Indicators

Knowledge Gaps and Controversies on Cardiopulmonary Exercise Testing in the Assessment of Pulmonary Vascular Disease: An Official Statement of the Pulmonary Vascular Research Institute Exercise and Right Ventricular Function Task Force

Relationship of Pulmonary Artery to Aorta Ratio With Pulmonary Vascular Resistance, Compliance, and Outcomes in COPD and Interstitial Lung Disease in PVDOMICS

More from Pulmonary Circulation

From Clusters to Outcomes: Machine Learning-Based Phenotyping in Intermediate–High-Risk Acute Pulmonary Embolism

Abstract

Other materials on this topic

Prospective Evaluation of Serial Biomarkers in Patients With Intermediate High Risk Acute Pulmonary Embolism: A Single Center Proof-of-Concept Study

Chronic Obstructive Pulmonary Disease and Pulmonary Hypertension: A Comparative Study of Biomarkers and Clinical Indicators

Knowledge Gaps and Controversies on Cardiopulmonary Exercise Testing in the Assessment of Pulmonary Vascular Disease: An Official Statement of the Pulmonary Vascular Research Institute Exercise and Right Ventricular Function Task Force

Relationship of Pulmonary Artery to Aorta Ratio With Pulmonary Vascular Resistance, Compliance, and Outcomes in COPD and Interstitial Lung Disease in PVDOMICS

More from Pulmonary Circulation