PLOS ONE, cilt.21, sa.1 January, 2026 (SCI-Expanded, Scopus)
The clinical adoption of deep learning in dermatology requires models that are not only highly accurate but also transparent and trustworthy. To address this dual challenge, this study presents a systematic investigation into deep feature fusion, exploring how to effectively combine complementary representations from diverse neural network architectures. We design and rigorously evaluate six distinct fusion models, first investigating depth-wise and channel-wise strategies for integrating features from powerful Convolutional Neural Network (CNN) backbones, and subsequently incorporating the global contextual awareness of Vision Transformers (ViTs). Evaluated on the challenging 7-class HAM10000 dataset, our optimized architecture achieves a weighted average Precision, Recall, and F1 score of 90%, demonstrating superior diagnostic performance. Crucially, our comprehensive explainable AI (XAI) analysis using Grad-CAM and SHAP reveals that the fusion strategy directly dictates the model’s clinical interpretability; our most effective models learn to base their predictions on salient dermatological features, such as border irregularity and color variegation, in a manner that aligns with expert reasoning. This work provides a robust framework and valuable architectural insights for developing the next generation of high-performing, clinically reliable, and transparent AI-powered diagnostic tools.