Transformer Tokenization Strategies for Network Intrusion Detection: Addressing Class Imbalance Through Architecture Optimization

Aksholak, Gulnur; Bedelbayev, Agyn; Magazov, Raiymbek; KAPLAN, KAPLAN

doi:10.3390/computers15020075

Transformer Tokenization Strategies for Network Intrusion Detection: Addressing Class Imbalance Through Architecture Optimization

Aksholak G., Bedelbayev A., Magazov R., KAPLAN K.

Computers, cilt.15, sa.2, 2026 (ESCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 2
Basım Tarihi: 2026
Doi Numarası: 10.3390/computers15020075
Dergi Adı: Computers
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Aerospace Database, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: CICIDS2017, deep learning, intrusion detection, network security, tokenization, Transformer
Kocaeli Üniversitesi Adresli: Evet

Özet

Network intrusion detection has challenges that fundamentally differ from language and vision tasks typically addressed by Transformer models. In particular, network traffic features lack inherent ordering, datasets are extremely class-imbalanced (with benign traffic often exceeding 80%), and reported accuracies in the literature vary widely (57–95%) without systematic explanation. To address these challenges, we propose a controlled experimental study that isolates and quantifies the impact of tokenization strategies on Transformer-based intrusion detection systems. Specifically, we introduce and compare three tokenization approaches—feature-wise tokenization (78 tokens) based on CICIDS2017, a sample-wise single-token baseline, and an optimized sample-wise tokenization—under identical training and evaluation protocols on a highly imbalanced intrusion detection dataset. We demonstrate that tokenization choice alone accounts for an accuracy gap of 37.43 percentage points, improving performance from 57.09% to 94.52% (100 K data). Furthermore, we show that architectural mechanisms for handling class imbalance—namely Batch Normalization and capped loss weights—yield an additional 15.05% improvement, making them approximately 21× more effective than increasing the training data by 50%. We achieve a macro-average AUC of 0.98, improve minority-class recall by 7–12%, and maintain strong discrimination even for classes with as few as four samples (AUC 0.9811). These results highlight tokenization and imbalance-aware architectural design as primary drivers of performance in Transformer-based intrusion detection and contribute practical guidance for deploying such models in modern network infrastructures, including IoT and cloud environments where extreme class imbalance is inherent. This study also presents practical implementation scheme recommending sample-wise tokenization, constrained class weighting, and Batch Normalization after embedding and classification layers to improve stability and performance in highly unstable table-based IDS problems.