Ensemble-Based Early Detection of Malaria via Explainable ViT-CNN Feature Fusion and SHAP


Özdemir E. Y., Koç C., Küçük K., Özyurt F.

JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025 (SCI-Expanded, Scopus) identifier identifier identifier

Özet

Malaria is a parasitic disease that causes significant morbidity and mortality worldwide. Early diagnosis plays a critical role in controlling the disease; however, current microscopic diagnostic methods have limitations as they require high expertise and are time-consuming. In this context, artificial intelligence-supported automated diagnostic systems stand out with their potential to provide fast and reliable diagnoses. This study proposes a hybrid model supported by explainable AI (XAI) to diagnose malaria from microscopic blood smear images. The proposed method involves combining the features obtained from Vision Transformer (ViT) and EfficientNet-based Convolutional Neural Network (CNN) models and filtering them in order of importance using the SHAP (SHapley Additive exPlanations) method. The most significant features are retrained with an ensemble model that combines CatBoost, XGBoost, and Logistic Regression algorithms. According to experimental results, the highest performance was achieved by combining the ViT_Large_16 and EfficientNetB0 architectures. The proposed hybrid model, with an accuracy rate of 95.61% and a processing time of 1.6 s, is approximately 204 times faster than traditional machine learning approaches. It was also observed that ViT-based models best capture complex and delicate details in blood smear images. These results show that this hybrid approach, which provides explainability and speed advantages, is successful enough to be integrated into clinical applications for the early diagnosis of infectious diseases such as malaria.