From reactive to predictive: A pattern-Aware framework for kubernetes autoscaling with large language model integration

Duman, Canberk; EKEN, SÜLEYMAN

doi:10.1016/j.jss.2026.112861

From reactive to predictive: A pattern-Aware framework for kubernetes autoscaling with large language model integration

Duman C., EKEN S.

Journal of Systems and Software, cilt.237, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 237
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.jss.2026.112861
Dergi Adı: Journal of Systems and Software
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC
Anahtar Kelimeler: Horizontal pod autoscaling, Kubernetes, Large language models, Predictive scaling, Resource management, Time series forecasting
Kocaeli Üniversitesi Adresli: Evet

Özet

Kubernetes, a leading container orchestration platform, allows workloads to be scaled through its Horizontal Pod Autoscaler (HPA). Kubernetes HPA suffers from limitations due to its reactive nature, resulting in resource inefficiency and performance degradation. This paper presents a Predictive Horizontal Pod Autoscaling (PHPA) methodology integrating time series forecasting, hyperparameter optimization, and Large Language Model (LLM)-based pattern recognition. The study establishes a comprehensive taxonomy of workload patterns and develops customized optimization strategies for each pattern. CPU-optimized forecasting models are systematically evaluated using temporal cross-validation and pattern-adaptive parameter selection. LLM integration automates pattern-based model selection, achieving 37.4% improvement in prediction accuracy compared to single-model approaches. Evaluations with Gemini 2.5 Pro-demonstrate 96.7% accuracy in automated pattern classification. Computational efficiency analysis confirms model deployability within typical Kubernetes resource constraints. Validation on real-world datasets (NASA HTTP logs and Alibaba Cluster Trace) demonstrates model generalizability, with gradient boosting models achieving sub-2% MAPE on production workloads. This research provides theoretical and practical foundations for transitioning from reactive to proactive scaling.