Simple Yet Powerful: Machine Learning-Based IoT Intrusion System With Smart Preprocessing and Feature Generation Rivals Deep Learning

Eren, Kazim; Küçük, KEREM; Özyurt, Fatih; Alhazmi, Omar

doi:10.1109/access.2025.3547642

Simple Yet Powerful: Machine Learning-Based IoT Intrusion System With Smart Preprocessing and Feature Generation Rivals Deep Learning

Eren K. K., Küçük K., Özyurt F., Alhazmi O. H.

IEEE ACCESS, cilt.13, ss.41435-41455, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3547642
Dergi Adı: IEEE ACCESS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.41435-41455
Kocaeli Üniversitesi Adresli: Evet

Özet

With the rapid advancements in deep learning, IoT intrusion detection systems have increasingly adopted deep learning models as the state-of-the-art solution due to their ability to handle complex data patterns. However, these solutions introduce the risk of overengineering, in which the complexity of the model outweighs its practical benefits. In contrast, classical machine learning techniques offer a more efficient alternative but are often overlooked due to a lack of focus on data pre-processing, which is critical for achieving optimal performance. Here we propose a classical machine learning system, built around a Random Forest classifier paired with a novel feature extraction algorithm adapted from Explainable Boosted Linear Regression (EBLR). Our workflow emphasizes the importance of well-structured preprocessing pipelines missing data handling, categorical feature encoding, and multicollinearity reduction, paired with classical machine learning models. We evaluated our method on the ToN-IoT dataset, which contains various network traffic data sets and various types of attacks. Experimental results show that our model achieves an area under the curve (AUC) score of 0.99 on both training and test sets with high performance in a variety of attack categories. Finally, we show that our method outperforms existing deep learning models, thus providing a novel and effective solution for intrusion detection in IoT environments. Experimental results show that our model achieves an area under the curve (AUC) score of 0.99 in both the training and test sets. Furthermore, the classifier achieves precision, recall and F1 score values of 0.999, 0.988, and 0.994, respectively, for normal traffic detection, while maintaining strong performance for other attack categories, such as denial of service (precision: 0.985, recall: 0.977, F1 score: 0.981) and scanning (precision: 0.984, recall: 0.992, F1 score: 0.988). Injection and ransomware attack types also demonstrate precision and recall scores above 0.90. These results highlight that, when paired with appropriate preprocessing and feature engineering, classical machine learning models still can provide an effective solution for intrusion detection in IoT environments.