AI in Pulmonary Embolism Detection: Promise versus Practical Readiness
Accurate and timely diagnosis of pulmonary embolism is vital. Artificial intelligence has shown impressive diagnostic metrics in many research reports, but there is a growing gap between study results and consistent performance in real-world hospital settings. This article summarizes what clinicians and leaders need to know about that gap.
The Promise of AI: High Accuracy in Initial Studies
Internal validation studies frequently report high sensitivity, specificity, and area under the curve for AI models detecting pulmonary embolism on CT pulmonary angiography. These figures often reflect algorithms trained and tested on a single center or a limited dataset, with careful labeling and controlled imaging protocols. In that environment AI can flag emboli rapidly and with apparent high accuracy.
The Reality Check: Generalizability Challenges
When algorithms are applied to external datasets from other hospitals, performance commonly declines. Reasons include variation in scanner models, image acquisition parameters, contrast timing, population prevalence, and disease presentation. Other contributors are selection bias in training cohorts, label noise, and overfitting to local patterns. These factors cause domain shift, so a model tuned to one system may miss or misclassify findings elsewhere.
What This Means for Healthcare Systems
Clinicians and administrators should treat published internal metrics as preliminary. External validation across multiple centers matters more for predicting real-world reliability. Practical considerations include local prospective testing, calibration of decision thresholds to local prevalence, workflows that keep clinicians in the loop, and plans for ongoing performance monitoring and incident review.
Moving Forward: Bridging the Gap
Improving generalizability will require diverse, multi-center datasets, transparent reporting of validation methods, and external prospective trials. Techniques like domain adaptation and federated learning can help, but operational readiness also depends on governance, clinician training, and post-deployment surveillance. Progress is underway, but broad clinical adoption should follow demonstrated, repeatable performance across the settings where the tool will be used.




