Application of machine learning models in predicting the risk of thromboembolic events in patients with nonvariceal gastrointestinal bleeding
Department
Internal Medicine
Document Type
Article
Publication Title
World Journal of Gastroenterology
Abstract
BACKGROUND: Clinically, patients with nonvariceal gastrointestinal bleeding (NVGB) are prone to thromboembolic events, but the specific risk remains unclear.
AIM: To identify risk factors and evaluate the performance of five machine learning (ML) models in predicting the risk of thromboembolic events in patients with NVGB.
METHODS: This retrospective cohort study enrolled 866 patients from a tertiary hospital for model training and internal validation, and 282 patients from three other tertiary hospitals for external validation. These data were used to develop five ML models to predict the risk of thromboembolic events in patients with NVGB. After initial feature selection by training ML models, ten variables were selected to construct simplified ML models. Model performance was evaluated using accuracy, precision, sensitivity, specificity, F1-score and area under the receiver operating characteristic curve. Calibration curve and decision curve analysis were used to further evaluate the predicted probabilities and net benefits of the models.
RESULTS: During hospitalization, the incidence of thromboembolic events was 25.61% in patients with NVGB. The categorical boosting (CatBoost) algorithm which combined variable importance and SHapley Additive exPlanations values identified 10 independent predictors of thromboembolic events: (1) History of anticoagulant drug use; (2) D-dimer level; (3) Age; (4) History of thromboembolism; (5) Length of hospital stays; (6) Intensive care unit (ICU) admission; (7) Hemoglobin level; (8) Use of hemostatic drugs; (9) Heart rate; and (10) Serum albumin level. We developed five simplified ML prediction models (L1 regularized logistic regression, random forest, support vector machines, extreme gradient boosting, and CatBoost) based on the above 10 predictors, which achieved area under the receiver operating characteristic curves of 0.805, 0.804, 0.806, 0.746, and 0.815 in external validation, respectively. The performance of all five ML models significantly exceeded that of D-dimer alone in both internal and external validation. The CatBoost model demonstrated good calibration and accuracy, achieving the lowest Brier score of 0.131 and 0.110 in the internal and external validation set, respectively. Of the five models, the CatBoost model was considered the preferred choice in clinical settings.
CONCLUSION: The findings in this study enable effective and timely preventive interventions for high-risk patients, and help avoid unnecessary monitoring in low-risk patients.
First Page
115527
DOI
10.3748/wjg.v32.i3.115527
Volume
32
Issue
3
Publication Date
1-21-2026
Publisher
WJG Press
Medical Subject Headings
Humans; Gastrointestinal Hemorrhage; Retrospective Studies; Machine Learning; Female; Male; Risk Factors; Middle Aged; Aged; Thromboembolism; Risk Assessment; ROC Curve; Incidence; Anticoagulants; Aged, 80 and over; Predictive Value of Tests
PubMed ID
41640609
Recommended Citation
Lu, C., Cheng, H., Zhu, R., Zhou, Y., Sun, K., Xu, L., Sang, J., Chen, J., Yu, C., Qin, Y., & Li, L. (2026). Application of machine learning models in predicting the risk of thromboembolic events in patients with nonvariceal gastrointestinal bleeding. World Journal of Gastroenterology, 32 (3), 115527. https://doi.org/10.3748/wjg.v32.i3.115527