AI outperforms radiologists in diagnosing and predicting interstitial lung disease
One of the most striking claims in the review is that AI-based classification models can meet or exceed the performance of expert radiologists in diagnosing ILDs, particularly in identifying patterns of idiopathic pulmonary fibrosis (IPF), usual interstitial pneumonia (UIP), and progressive pulmonary fibrosis (PPF). For example, a deep learning algorithm trained on over 1100 HRCT scans achieved a classification accuracy of 73.3%, outperforming human experts in nearly two-thirds of cases. The AI model processed 150 HRCT scans in just two seconds, highlighting not just accuracy but remarkable efficiency.
Artificial intelligence is reshaping the clinical landscape of interstitial lung diseases (ILDs), offering a compelling alternative to the traditional radiological assessments often marred by subjectivity and variability. A sweeping review titled “Computer-Aided Evaluation of Interstitial Lung Diseases” published in Diagnostics, underscores the growing diagnostic, prognostic, and monitoring capabilities of machine learning and deep learning systems in ILD care.
The review, conducted by a team of researchers from the Istituto Figlie di San Camillo in Cremona and the University Hospital of Parma, evaluates a wide spectrum of AI applications in ILDs, from early detection and risk stratification to disease progression tracking and outcome prediction. Their findings point to a paradigm shift in how radiologists and pulmonologists may collaborate with AI-powered tools to interpret high-resolution computed tomography (HRCT) and other clinical data more effectively.
How accurate are AI systems in diagnosing interstitial lung diseases compared to humans?
One of the most striking claims in the review is that AI-based classification models can meet or exceed the performance of expert radiologists in diagnosing ILDs, particularly in identifying patterns of idiopathic pulmonary fibrosis (IPF), usual interstitial pneumonia (UIP), and progressive pulmonary fibrosis (PPF). For example, a deep learning algorithm trained on over 1100 HRCT scans achieved a classification accuracy of 73.3%, outperforming human experts in nearly two-thirds of cases. The AI model processed 150 HRCT scans in just two seconds, highlighting not just accuracy but remarkable efficiency.
Another algorithm, known as CALIPER, was able to quantify key features such as fibrosis extent, honeycombing, and traction bronchiectasis with high correlation to pulmonary function tests (PFTs) and 12-month mortality outcomes. In a direct comparison with visual scoring by radiologists, CALIPER-derived biomarkers outperformed human assessments in prognostic value.
The study also examined support vector machine (SVM)-based models, content-based image retrieval systems, and convolutional neural networks (CNNs) for classifying and predicting ILD patterns. In nearly all instances, AI systems demonstrated consistent sensitivity and specificity across test cohorts, sometimes exceeding 90% in identifying fibrotic features.
Can AI predict disease progression and mortality in ILD patients?
Beyond diagnosis, the reviewed algorithms proved adept at forecasting disease progression, especially when HRCT imaging was supplemented with clinical and functional data. Tools like SOFIA (Systematic Objective Fibrotic Imaging Analysis Algorithm) and other deep learning models incorporated longitudinal data to calculate individualized probabilities of survival, functional decline, or transplant need over a three-year horizon.
For instance, one model trained on baseline and follow-up HRCT scans of patients with progressive fibrotic ILDs delivered a sensitivity of 73% and specificity of 84% in predicting 3-year mortality - figures that outpaced human performance in comparable visual assessments. Another model, using data from 468 patients over a 24-month follow-up period, significantly improved mortality predictions when AI-derived fibrosis measurements were added to conventional clinical markers.
Moreover, the ability of AI to stratify interstitial lung abnormalities (ILAs), a precursor to ILDs, proved especially valuable. In one cited study, machine learning models predicted ILA progression with an AUC of 0.94, far exceeding what is typically achievable through visual interpretation alone.
What are the main challenges in implementing AI in ILD clinical practice?
Despite the promise, the review stresses that AI in ILDs is not ready for widespread clinical deployment without overcoming several barriers. Chief among them is data heterogeneity. ILDs comprise a diverse group of rare diseases with variable radiological and pathological manifestations, making it difficult to train models on balanced, representative datasets. The study advocates for greater international collaboration in collecting standardized, annotated radiological and clinical data, ideally through centralized registries.
Another challenge lies in the lack of uniform HRCT acquisition protocols across imaging centers, which can hinder cross-site generalization of AI models. Differences in scanner types, reconstruction algorithms, and image quality can all affect model performance. Transparency in AI model training and validation processes will also be necessary to secure regulatory approval and clinician trust.
Lastly, the authors highlight that while AI tools have proven reliable in research environments, few have been tested in real-world clinical workflows. Prospective trials comparing AI-derived biomarkers against established staging systems like the GAP index (Gender, Age, Physiology) are needed to validate their clinical utility.
- FIRST PUBLISHED IN:
- Devdiscourse

