Journal of Systems and Software, cilt.237, 2026 (SCI-Expanded, Scopus)
Kubernetes, a leading container orchestration platform, allows workloads to be scaled through its Horizontal Pod Autoscaler (HPA). Kubernetes HPA suffers from limitations due to its reactive nature, resulting in resource inefficiency and performance degradation. This paper presents a Predictive Horizontal Pod Autoscaling (PHPA) methodology integrating time series forecasting, hyperparameter optimization, and Large Language Model (LLM)-based pattern recognition. The study establishes a comprehensive taxonomy of workload patterns and develops customized optimization strategies for each pattern. CPU-optimized forecasting models are systematically evaluated using temporal cross-validation and pattern-adaptive parameter selection. LLM integration automates pattern-based model selection, achieving 37.4% improvement in prediction accuracy compared to single-model approaches. Evaluations with Gemini 2.5 Pro-demonstrate 96.7% accuracy in automated pattern classification. Computational efficiency analysis confirms model deployability within typical Kubernetes resource constraints. Validation on real-world datasets (NASA HTTP logs and Alibaba Cluster Trace) demonstrates model generalizability, with gradient boosting models achieving sub-2% MAPE on production workloads. This research provides theoretical and practical foundations for transitioning from reactive to proactive scaling.