International Journal of Machine Learning and Cybernetics, vol.17, no.5, 2026 (SCI-Expanded, Scopus)
The rapid expansion of IoT and IIoT networks has introduced unprecedented cybersecurity challenges, with traditional Intrusion Detection Systems (IDS) struggling to detect evolving cyber threats. Existing deep learning-based IDS often fail to capture both local spatial features and long-range dependencies in network traffic, resulting in suboptimal threat detection. To bridge this research gap, this study proposes a hybrid CNN-Transformer model that leverages the complementary strengths of both architectures. The Convolutional Neural Network (CNN) component excels at extracting localized feature patterns from sequential data, while the Transformer’s self-attention mechanism models the global, long-range dependencies between these patterns. The proposed model was rigorously evaluated on the Edge-IIoTset dataset and benchmarked against standalone CNN-only, Transformer-only, and Deep Neural Network (DNN) models. Experimental results demonstrate that the CNN-Transformer model achieves superior performance, attaining the highest weighted F1-score of 0.82 and exceptional Area Under the Curve (AUC) scores, including perfect 1.000 AUCs for critical attack classes like DDoS-ICMP, DDoS-UDP, and MITM. This result represents a more robust classification profile than the standalone CNN (F1: 0.82) and significantly outperforms the Transformer-only (F1: 0.81) and DNN (F1: 0.75) models. Additionally, Explainable AI (XAI) techniques were incorporated to enhance transparency, identifying critical features as key indicators for specific attack types. This research provides a robust, high-accuracy IDS framework for next-generation IoT cybersecurity, paving the way for more intelligent and adaptive threat detection systems.